Computing the Relative Value of Spatio-Temporal Data in Data Marketplaces
MetadataShow full item record
Spatio-temporal information is used for driving a plethora of intelligent transportation, smart-city and crowd-sensing applications.Data is now a valuable production factor and data marketplaces have appeared to help individuals and enterprises bring it to market and the ever-growing demand. Such marketplaces are able to combine data from different sources to meet the requirements of different applications. In this paper we study the problem of estimating the relative value of spatio-temporal datasets combined in marketplaces for predicting transportation demand and travel time in metropolitan areas. Using large datasets of taxi rides from Chicago, Porto and New York we show that simplistic but popular approaches for estimating the relative value of data, such as splitting it equally among the data sources, more complex ones based on volume or the “leave-one-out” heuristic, are inaccurate. Instead, more complex notions of value from economics and game-theory, such as the Shapley value, need to be employed if one wishes to capture the complex effects of mixing different datasets on the accuracy of forecasting algorithms. This does not seem to be a coincidental observation related to a particular use case but rather a general trend across different use cases with different objective functions.