SGD will do the job here, but scikit-learn does not have one that could be applied to the task. Writing your own will do the job, but it will be very slow since you cannot directly parallelize matrix factorization of SGD. Check out the distributed SGD algorithm described here . It is not so difficult to implement and significantly speeds up the process.
source share