I have a large array of data (text) that I converted to a term-document sparse matrix (I use scipy.sparse.csr.csr_matrix to store the sparse matrix). I want to find for each document the best matches of the nearest neighbors. I was hoping the NearestNeighbor procedure in the Python scikit-learn library ( sklearn.neighbors.NearestNeighbor , to be exact) would solve my problem, but efficient algorithms that use space partitioning data structures like KD trees or Ball trees do not work with sparse matrices, Only the brute force algorithm works with sparse matrices (which is impossible in my case, since I deal with a large case).
Is there an effective implementation of finding the nearest neighbor for sparse matrices (in Python or in any other language)?
Thanks.
python scipy scikit-learn nearest-neighbor
abhinavkulkarni
source share