I have a data set of 50 million user preferences, containing 8 million individual users and 180K different products. I am currently using a logical data model and have a basic recommendation board based on Tanimoto. I am trying to learn different algorithms to get the best recommendations and start with SVD with ALSWR factorizer. I used the basic SVD recommender presented in mahout as follows.
DataModel dataModel = new FileDataModel("/FilePath");
ALSWRFactorizer factorizer = new ALSWRFactorizer(dataModel, 50, 0.065, 15);
recommender = new SVDRecommender(dataModel, factorizer);
In accordance with my basic understanding, I believe that factorization takes place offline and creates user-defined functions and element functions. Although the actual queries are served by calculating the top products for the user by using the point product of the user vector and all possible element vectors.
I have a few doubts about the approach: -
- What is the best way to choose factoring options and how long does factoring take? I tried with the above parameters, and the factorization itself lasted 30 minutes.
- Is there a way to serve requests in real time a little faster, since using a point product with all possible item vectors leads to an increase in request time? Is there something offline in SVD?
- , , ?