Combining Graphs and Big Data to Recommend Apps
Date
2014-09-11Abstract
Recommendation engines (RE) are becoming highly popular, e.g., in the area of e-commerce. A RE offers new items (products or content) to users based on their profile and historical data. The most popular algorithms used in RE are based on collaborative filtering. This technique makes recommendations based on the past behavior of other users and the similarity between users and items. Metrics used for the computation of similarity include Euclidean distance, cosine distance, and correlation based distances. We have examined alternative similarity definitions based on the properties of the networks formed by users and items. The evaluated similarity metrics use graph theoretic concepts like the degree, several centrality measures, and ow maximization. In this paper we present how the techniques proposed have been evaluated in a real environment for the recommendation of applications to smartphone users. Training the RE required the pre-processing of a large dataset consisting of around 1 billion records. A big data environment, based on Hadoop/Elastic Map Reduce, HBase, and Pig was set up for building and processing the application and user graphs. The big data environment reduced the processing time from more than one week in a single machine, to a couple of hours in the Hadoop cluster. Hence, the application of big data techniques allows a near real-time re-training of the RE.