Export Corpus from TM to R

I am trying to export Corpus objects from R to static files. The packages contain source documents created by analyzing existing pre-processed files in the file system. The author describes a method for this in his Introduction to Text Span in R (p. 2), proposing

 > writeCorpus(file) 

but my attempts so far give only the following:

 Error in UseMethod("as.PlainTextDocument", x): no applicable method for 'as.PlainTextDocument' applied to an object of class "character" 

My script is pretty simple so far, and I expect it to be a simple miss. Any advice is appreciated: this seems like a problem.

 # Turn off Java so it doesn't interfere with Weka interface Sys.setenv(NOAWT=1) # Load required text mining packages require(tm) require(rJava) require(RWeka) require(Snowball) # Populate a vector with the number of subdirectories in preprocessed dir preprocessed <- list.files(path="preprocessed_dir", include.dirs=TRUE, full.names=TRUE) # For each element in the vector for(i in 1:length(preprocessed)) { # Get the files in each subdirectory by appending a number to the absolute path files <- list.files(sprintf("preprocessed_dir/%.0f", i)) # Create a Corpus object of all the files in the subdirectory corpora <- Corpus(VectorSource(files)) # Stem the words in the Corpus object corpora <- tm_map(corpora, SnowballStemmer) # (Try to) write the object to the file system writeCorpus(corpora) } 

FWIW: Calling class(corpora) returns [1] "VCorpus" "Corpus" "list" so the objects apparently are not of type character

+4
source share
1 answer

I tell you why you want to export the enclosure. If you want to display texts to others, you can simply use orignal texts.

If you want to export it and reuse it with R, my suggestion is that you can use the save () function to save corpus to .RData.

Then, if you want to load it, just use the load () function.

0
source

All Articles