LMGTFY using "r-project tm parallel", as the search strategy produces this as a third hit:
Distributed text distribution with tm
Copying directly from slides: Solution: 1. Distributed storage Dataset copied to DFS ('DistributedCorpus) Only meta-information about the housing remains in memory 2. Parallel calculations Computing operations (Map) for all elements in parallel Paradigm MapReduce Workhorses tm_map () and TermDocumentMatrix () Processed documents (revisions) can be obtained upon request.
Implemented in the plugin package in tm: tm.plugin.dc.
#Distributed Text Mining in R > library("tm.plugin.dc") > dc <- DistributedCorpus(DirSource("Data/reuters"), list(reader = readReut21578XML) ) > dc <- as.DistributedCorpus(Reuters21578) > summary(dc)
Further search using the terms: tm, snow, parLapply ... creates this link:
With this code:
library(snow) cl <- makeCluster(4, type="SOCK") par(ask=TRUE) bigsleep <- function(sleeptime, mat) Sys.sleep(sleeptime) bigmatrix <- matrix(0, 2000, 2000) sleeptime <- rep(1, 100) tm <- snow.time(clusterApply(cl, sleeptime, bigsleep, bigmatrix)) plot(tm) cat(sprintf("Elapsed time for clusterApply: %f\n", tm$elapsed)) tm <- snow.time(parLapply(cl, sleeptime, bigsleep, bigmatrix)) plot(tm) cat(sprintf("Elapsed time for parLapply: %f\n", tm$elapsed)) stopCluster(cl)
source share