Separate string strings and frame data

I have small data from a text file read through readLines . The file had characters like # , and I think this is the reason read.table was unable to read it. Here are the first five lines of dput:

 files<-c("\trfinal\t\t", "eq1\t\t\t", "0.ster6\t1.00\t(1.00,1.00)\t.", "1.ster6\t0.65\t(0.47,0.88)\t0.006", "0.parkinson\t1.00\t(1.00,1.00)\t.", "1.ster6#0.parkinson\t1.00\t(1.00,1.00)\t.") 

\t means space between lines. I would like to split these text lines and put them in a four-column grid (data frame).

I tried strsplit(files, "[\\t]") , but this is actually not a trick. Any help?

+7
string r
source share
2 answers

You can turn off # processing like comment.char in read.table :

 read.table(text=files, sep='\t', comment.char="") # V1 V2 V3 V4 # 1 rfinal # 2 eq1 # 3 0.ster6 1.00 (1.00,1.00) . # 4 1.ster6 0.65 (0.47,0.88) 0.006 # 5 0.parkinson 1.00 (1.00,1.00) . # 6 1.ster6#0.parkinson 1.00 (1.00,1.00) . 
+8
source share

If "\t" just represents a tab delimiter, try read.delim :

 read.delim(text = files) # X rfinal X.1 X.2 # 1 eq1 NA # 2 0.ster6 1.00 (1.00,1.00) . # 3 1.ster6 0.65 (0.47,0.88) 0.006 # 4 0.parkinson 1.00 (1.00,1.00) . # 5 1.ster6#0.parkinson 1.00 (1.00,1.00) . 

You can also consider the stringi package. Here I saw "\t" as a fixed pattern:

 library(stringi) stri_split_fixed(files, "\t", simplify = TRUE) # [,1] [,2] [,3] [,4] # [1,] "" "rfinal" "" "" # [2,] "eq1" "" "" "" # [3,] "0.ster6" "1.00" "(1.00,1.00)" "." # [4,] "1.ster6" "0.65" "(0.47,0.88)" "0.006" # [5,] "0.parkinson" "1.00" "(1.00,1.00)" "." # [6,] "1.ster6#0.parkinson" "1.00" "(1.00,1.00)" "." 

In general, however, it is not clear what should be considered as a heading, etc., and it would be better to implement @musically_ut a proposal to use comment.char and try to solve the problem in the source.

+7
source share

All Articles