Image Watermarking for Machine Learning Datasets
Date
2023-06-18Abstract
Machine learning has received increasing attention for the last decade due to its significant success in classification problems in almost every application domain. For its success, the amount of available data for training plays a crucial role in the creation of a machine-learning model. However, the data-gathering process for machine learning algorithms is a tedious and time-consuming task. In many cases, the developers rely on publicly available datasets, which are not always of high quality. Recently, we are witnessing a data market paradigm where valuable datasets are sold. Thus, once the dataset is created or bought, protecting the dataset against illegal use or (re)sale and establishing intellectual property rights is necessary. In this paper, we investigate the question of deploying well- studied image watermarking techniques to be applied to classification algorithm datasets, without degrading the quality of the dataset. We investigate whether Singular Value Decomposition (SVD)-based techniques from image watermarking could be deployed on machine learning datasets or not. To this end, we chose the watermarking technique described in [8] and applied it to a machine-learning dataset. We provide experimental results on the robustness of the scheme. Our results show that the watermark embedding scheme provides decent imperceptibility and robustness against update, zero- out, and insertion attacks but, it is not successful against deletion attacks. We believe our work can inspire researchers who might want to consider applying well-studied image watermarking techniques to machine learning datasets