dc.description.abstract | Data-driven decision making powered by ML is changing how the society and the economy work and is having a profound positive impact on our daily life. A McKinsey report predicted that data-driven decision-making could reach US\$2.5 trillion globally by 2025, whereas the European Data Strategy estimates a size of 827 billion euro for the EU27. ML is driving up the demand for data in what has been called the fourth industrial revolution.
A large number of Data Marketplaces (DMs) have appeared in the last few years to help owners monetise their data, and data buyers fuel their marketing processes, train their ML algorithms, and make data-driven decisions. In this poster, we present some preliminary findings of what is, to the best of our knowledge, the first systematic measurement study of DM for data products. This ecosystem, despite being quite vibrant commercially, remains completely unknown to the scientific community. Very basic questions such as "What is the range of prices of data traded in modern DMs?", "Which categories of products command the highest prices?'', "Are the observed prices consistent across DMs?", "Which features correlate with the most expensive data products?'' appear to have no answer and evade most meaningful speculations.
To answer such questions, we first conducted an extensive survey for compiling a catalogue with more than 180 DMs. We then selected 38 of them that fulfill necessary criteria for a measurement study. For these DMs we developed custom crawlers for retrieving information about the products they trade. Using these crawlers we obtained information for more than 213,964 data products and 2,015 data providers. We also developed ML classifiers for identifying data products of similar categories to compare prices across DMs, and executed 9 different regression models to understand which features are driving the prices of data products. | es |