How to deal with missing values ​​in python scikit NMF

I am trying to apply NMF to my dataset using python scikit-learn. My dataset contains 0 values ​​and missing values. But scikit-learn does not give NaN values ​​in the data matrix. Some reports said replacing missing values ​​with zeros.

my questions:

  • If I replace the missing value with zeros, how can the algorithm indicate the missing values ​​and real zero values?

  • Are there any other implementations of NMF to fix the missing values?

  • Or, if there are any other matrix factorization algorithms, is it possible to predict the missing value?

+5
source share
2 answers

SGD will do the job here, but scikit-learn does not have one that could be applied to the task. Writing your own will do the job, but it will be very slow since you cannot directly parallelize matrix factorization of SGD. Check out the distributed SGD algorithm described here . It is not so difficult to implement and significantly speeds up the process.

+1
source

This section talks about this on scikit-learn github, and version seams are available but not yet passed into the main code.

https://github.com/scikit-learn/scikit-learn/pull/8474

0
source

All Articles