Transport sparse matrix from Python to R

I am doing some text analysis work in Python. Unfortunately, I need to switch to R in order to use a specific package (unfortunately, a package cannot be replicated in Python easily).

Currently, the text is analyzed for the bigram number, reduced to a vocabulary of about 11,000 bigrams, and then saved as a dictionary:

{id1: {'bigrams':[(bigram1, count), (bigram2, count), ...]}, id2: {'bigrams': ...} 

I need to get this in dgCMatrix in R, where the rows are id1, id2, ... and the columns are different bigrams, so the cell is the โ€œcountโ€ for that id-bigram.

Any suggestions? I was thinking of expanding it only to massive CSV, but it seems super inefficient and probably unacceptable due to memory limitations.

+5
source share
1 answer

Could you write the matrix in MatrixMarket format using scipy mmwrite and then read it in R using readMM from Matrix ?

+4
source

All Articles