Tm package: inspect () returns char count, not content

Whenever I run a function inspect()in the tm R package , I get a char score, not the contents of the documents. This happens regardless of which data source I use.

Here is my code:

library(tm)

data <- c("one two three", "two three four", "three four five")

corp <- VCorpus(VectorSource(data))

inspect(corp)

My sample output is:

inspect(corp)

VCorpus

Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 3

[[1]]
PlainTextDocument

Metadata:  7

Content: chars: 13

[[2]]
PlainTextDocument

Metadata:  7

Content:  chars: 14

[[3]]
PlainTextDocument
Metadata:  7

Content:  chars: 15

but I want:

[[1]]
PlainTextDocument

Metadata:  7

one two three

[[2]]
PlainTextDocument

Metadata:  7

two three four

[[3]]
PlainTextDocument
Metadata:  7

three four five

Here is another example of using Ovid text files, which are used by default in the TM package and are mentioned in this section "Introduction to the tm package" at the beginning of Ingo Feinerer. http://cran.r-project.org/web/packages/tm/vignettes/tm.pdf

the code:

txt <- system.file("texts", "txt", package = "tm")
ovid <- VCorpus(DirSource(txt, encoding = "UTF-8"),
 + readerControl = list(language = "lat"))
inspect(ovid[1:2])

What I want and what it should output:

<<VCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 2

 [[1]]
<<PlainTextDocument (metadata: 7)>>
  Si quis in hoc artem populo non novit amandi,
hoc legat et lecto carmine doctus amet.
arte citae veloque rates remoque moventur,
arte leves currus: arte regendus amor.
curribus Automedon lentisque erat aptus habenis,
Tiphys in Haemonia puppe magister erat:
me Venus artificem tenero praefecit Amori;
Tiphys et Automedon dicar Amoris ego.
ille quidem ferus est et qui mihi saepe repugnet:
sed puer est, aetas mollis et apta regi.
Phillyrides puerum cithara perfecit Achillem,
atque animos placida contudit arte feros.
qui totiens socios, totiens exterruit hostes,
creditur annosum pertimuisse senem.
[[2]]
<<PlainTextDocument (metadata: 7)>>
quas Hector sensurus erat, poscente magistro
verberibus iussas praebuit ille manus.
Aeacidae Chiron, ego sum praeceptor Amoris:
saevus uterque puer, natus uterque dea.
sed tamen et tauri cervix oneratur aratro,
frenaque magnanimi dente teruntur equi;
et mihi cedet Amor, quamvis mea vulneret arcu
pectora, iactatas excutiatque faces.
quo me fixit Amor, quo me violentius ussit,

What does he output for me:

<<VCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 2

[[1]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 49
Content:  chars: 48
Content:  chars: 46
Content:  chars: 47
Content:  chars: 0
Content:  chars: 52
Content:  chars: 48
Content:  chars: 46
Content:  chars: 46
Content:  chars: 53
Content:  chars: 0
Content:  chars: 49
Content:  chars: 49
Content:  chars: 50
Content:  chars: 49
Content:  chars: 44

[[2]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 48
Content:  chars: 47
Content:  chars: 47
Content:  chars: 48
Content:  chars: 46
Content:  chars: 0
Content:  chars: 48
Content:  chars: 49
Content:  chars: 45
Content:  chars: 47
Content:  chars: 45
Content:  chars: 0
Content:  chars: 51
Content:  chars: 42
Content:  chars: 45
Content:  chars: 48
Content:  chars: 44
+4
source share
2 answers

0.6-1 tm . , .

, as.character() .

, ( , tm 0.6-2):

> txt <- system.file("texts", "txt", package = "tm")
> ovid <- VCorpus(DirSource(txt, encoding = "UTF-8"),
    readerControl = list(language = "lat"))

:

> inspect(ovid[1:2])
<<VCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 2

[[1]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 676

[[2]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 700

, as.character() , ( , ):

> as.character(ovid[[1]])
 [1] "    Si quis in hoc artem populo non novit amandi,"    
 [2] "         hoc legat et lecto carmine doctus amet."     
 [3] "    arte citae veloque rates remoque moventur,"       
 [4] "         arte leves currus: arte regendus amor." 

, writeLines():

> writeLines(as.character(ovid[[1]]))
    Si quis in hoc artem populo non novit amandi,
         hoc legat et lecto carmine doctus amet.
    arte citae veloque rates remoque moventur,
         arte leves currus: arte regendus amor.

, lapply() ( ):

> lapply(ovid[1:2], as.character)
$ovid_1.txt
 [1] "    Si quis in hoc artem populo non novit amandi,"    
 [2] "         hoc legat et lecto carmine doctus amet."     
 [3] "    arte citae veloque rates remoque moventur,"       
 [4] "         arte leves currus: arte regendus amor." 

$ovid_2.txt
 [1] "    quas Hector sensurus erat, poscente magistro"   
 [2] "         verberibus iussas praebuit ille manus."    
 [3] "    Aeacidae Chiron, ego sum praeceptor Amoris:"    
 [4] "         saevus uterque puer, natus uterque dea."

, , l_ply() plyr ( ):

> l_ply(ovid[1:2], function(doc) { 
    print(doc) # output summary of document
    writeLines("") # output blank line between results
    writeLines(as.character(doc)) # output clean document text
    writeLines("") # output blank line between results
  })

<<PlainTextDocument>>
Metadata:  7
Content:  chars: 676

    Si quis in hoc artem populo non novit amandi,
         hoc legat et lecto carmine doctus amet.
    arte citae veloque rates remoque moventur,
         arte leves currus: arte regendus amor.

<<PlainTextDocument>>
Metadata:  7
Content:  chars: 700

    quas Hector sensurus erat, poscente magistro
         verberibus iussas praebuit ille manus.
    Aeacidae Chiron, ego sum praeceptor Amoris:
         saevus uterque puer, natus uterque dea.

, !

+3

, anythings 'tm' 0.6-1, may may, 07. 0.6, .

:)

0

All Articles