Best module for computing large matrices in Python?

I am developing a simple recommendation system and trying to do some calculations, such as SVD, RBM, etc.

To be more convincing, I'm going to use the Movielens or Netflix dataset to measure system performance. However, two sets of data have more than 1 million users and more than 10 thousand elements, so it is impossible to put all the data in memory. I have to use some special modules to handle such a large matrix.

I know that some tools in SciPy can handle this, and the divisi2 used by python-recsys also seems to be a good choice. Or maybe there are some better tools that I don’t know?

Which module should I use? Any suggestion?

+4
source share
3 answers

I suggest SciPy , in particular Sparse . As Dougal noted, Numpy is not suitable for this situation.

+6
source

I found another solution called crab, I am trying to find and compare some of them.

+2
source

If your problem is only that the data in memory is using 64bit python with 64bit numpy. If you don't have enough physical memory, you can simply increase the virtual memory at the os level. Virtual memory size is limited only by your hdd size. However, computational speed is another beast!

-1
source

All Articles