How to deal with missing values in python scikit NMF

Question

How to deal with missing values in python scikit NMF

I am trying to apply NMF to my dataset using python scikit-learn. My dataset contains 0 values and missing values. But scikit-learn does not give NaN values in the data matrix. Some reports said replacing missing values with zeros.

my questions:

If I replace the missing value with zeros, how can the algorithm indicate the missing values and real zero values?
Are there any other implementations of NMF to fix the missing values?
Or, if there are any other matrix factorization algorithms, is it possible to predict the missing value?

+5

python scikit-learn recommendation-engine matrix-factorization svd

Zhaojie tao Sep 7 '16 at 10:37

source share

2 answers

silentser · Answer 1 · 2017-03-31T06:52:58+0000

SGD will do the job here, but scikit-learn does not have one that could be applied to the task. Writing your own will do the job, but it will be very slow since you cannot directly parallelize matrix factorization of SGD. Check out the distributed SGD algorithm described here . It is not so difficult to implement and significantly speeds up the process.

Cristiana S. Parada · Answer 2 · 2017-10-25T20:11:28+0000

This section talks about this on scikit-learn github, and version seams are available but not yet passed into the main code.

https://github.com/scikit-learn/scikit-learn/pull/8474

How to deal with missing values ​​in python scikit NMF

More articles:

How to deal with missing values in python scikit NMF