Symbol "|" in R

I would like to split the character string in the pattern "|"

but

unlist(strsplit("I am | very smart", " | ")) [1] "I" "am" "|" "very" "smart" 

or

 gsub(pattern="|", replacement="*", x="I am | very smart") [1] "*I* *a*m* *|* *v*e*r*y* *s*m*a*r*t*" 
+5
source share
4 answers

Use the fixed argument:

 unlist(strsplit("I am | very smart", " | ", fixed=TRUE)) # [1] "I am" "very smart" 

A side effect is faster calculation.

stringr alternative:

 unlist(stringr::str_split("I am | very smart", fixed(" | "))) 
+15
source

| is a metacharacter. You need to avoid this (using \\ before it).

 > unlist(strsplit("I am | very smart", " \\| ")) [1] "I am" "very smart" > sub(pattern="\\|", replacement="*", x="I am | very smart") [1] "I am * very smart" 

Edit: The reason you need two backslashes is because one backslash prefix is ​​reserved for special characters such as \n (new line) and \t (tab). See the ?regex page for more information. Other metacharacters:. . \ | ( ) [ { ^ $ * + ?

+13
source

If you are parsing a table than calling read.table , this might be a better option. A tiny example:

 > txt <- textConnection("I am | very smart") > read.table(txt, sep='|') V1 V2 1 I am very smart 

So I would suggest getting a wiki page with Rcurl , grabbing the interesting part of the XML page (which has a really neat feature for parsing HTML tables), and if the HTML format is not available for calling read.table with the specified sep . Good luck

+4
source

Trumpet '|' is the metacharacter used as the OR operator in the regular expression.

try unlist(strsplit("I am | very smart", "\s+\|\s+"))

0
source

All Articles