Try Before You Buy: a Practical Data Purchasing Algorithm for Real-World Data Marketplaces
Fecha
2022-11-30Resumen
Data trading is becoming increasingly popular, as evident by the appearance of scores of data marketplaces (DMs) in the last few years satisfying the demand for third-party data. For buyers, however, deciding whether paying the requested price makes sense can only be done after having tested the data on their ML model. In this paper, we propose a method for optimizing data purchasing decisions. We show that if a marketplace provides to potential buyers a measure of the performance of their models on \emph{individual} datasets, then they can select which of them to buy with an efficacy that approximates that of knowing the performance of each possible combination of datasets offered by the DM. We call the resulting algorithm Try Before You Buy (TBYB) and demonstrate over synthetic and real-world datasets how TBYB can lead to near optimal data purchasing with only O(N) instead of O(2^N) information and execution time.