Convert from class "simple_triplet_matrix" to class "matrix"

I am trying to convert the following simple TermDocumentMatrix() matrix created using TermDocumentMatrix() of the tm package

 A term-document matrix (317443 terms, 86960 documents) Non-/sparse entries: 18472230/27586371050 Sparsity : 100% Maximal term length: 653 Weighting : term frequency (tf) 

class

 [1] "TermDocumentMatrix" "simple_triplet_matrix" 

into a dense matrix .

But

 dense <- as.matrix(tdm) 

generates an error

 Error in vector(typeof(x$v), nr * nc) : vector size cannot be NA In addition: Warning message: In nr * nc : NAs produced by integer overflow 

I cannot understand the error and warning message. Attempting to replicate an error on a small dataset using

 library(tm) data("crude") tdm <- TermDocumentMatrix(crude) as.matrix(tdm) 

does not create the same problem. I saw from this answer that a similar problem was solved with the help of the slam package (although the question was about the operation of summing, not converting to a dense matrix), I looked at the slam documentation, but could not find any specific function for converting an object of class simple_triplet_matrix into an object of class matrix .

+6
source share
2 answers

You get an error message, because, as you commented, you reach the limit of the integer limit, normal, because you have a huge number of documents .. This reproduces the error:

 as.integer(.Machine$integer.max+1) [1] NA Warning message: NAs introduced by coercion 

A vector function that takes an integer as a parameter fails because the second parameter is NA.

One solution is to override as.matrix.simple_triplet_matrix without calling vector . For instance:

 as.matrix.simple_triplet_matrix <- function (x, ...) { nr <- x$nrow nc <- x$ncol ## old line: y <- matrix(vector(typeof(x$v), nr * nc), nr, nc) y <- matrix(0, nr, nc) ## y[cbind(x$i, x$j)] <- x$v dimnames(y) <- x$dimnames y } 

But I'm not sure that it is a good idea to force such a sparse matrix into the matrix (100%).

EDIT

One idea is to use the saparseMatrix package from Matrix . Here is an example where I compare the objects generated by each coercion. You get 10 percent income (I think that with respect to your very sparse matrix, you will get more) using sparseMatrix . Moreover, addition and multiplication are supported by a sparse matrix.

 require(tm) data("crude") dtm <- TermDocumentMatrix(crude, control = list(weighting = weightTfIdf, stopwords = TRUE)) library(Matrix) Dense <- sparseMatrix(dtm$i,dtm$j,x=dtm$v) dense <- as.matrix(dtm) ## check sizes floor(as.numeric(object.size(dense)/object.size(Dense))) ## addistion and multiplication are supported Dense+Dense Dense*Dense 
+2
source

I had a similar problem. I'm not sure my problem is identical, but when combining a sparse matrix with a dense matrix, I got a similar error message NAs produced by integer overflow . I was able to fix this by converting a dense matrix to single precision using as.single . I think that the "overflowing integers" are caused by operations in the sparseMatrix package, which somehow truncate the double precision values, leaving the remaining digits.

0
source

All Articles