I am developing a simple recommendation system and trying to do some calculations, such as SVD, RBM, etc.
To be more convincing, I'm going to use the Movielens or Netflix dataset to measure system performance. However, two sets of data have more than 1 million users and more than 10 thousand elements, so it is impossible to put all the data in memory. I have to use some special modules to handle such a large matrix.
I know that some tools in SciPy can handle this, and the divisi2 used by python-recsys also seems to be a good choice. Or maybe there are some better tools that I donβt know?
Which module should I use? Any suggestion?
source share