How to format this search in Regex in R? It works great in an online test.

Question

How to format this search in Regex in R? It works great in an online test.

In R, I have a data column in the data frame, and each element looks something like this:

Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Marinilabiaceae

What I want is the section after the last semicolon, and I'm trying to use "sub" as well as duplicate an existing column and create a new one with saved endings. In essence, I want this (kind):

Marinilabiaceae

The code snippet is as follows:

mydata$new_column<- sub("([\\s\\S]*;)", "", mydata$old_column)

In this situation, I use \\, and not \because of R escape sequences. subreplaces parts that I don't want and updates them to a new column. I tested Regex several times in places like: http://regex101.com/r/kS7fD8/1

However, I am still afraid, because the results are very strange. Now my new column is filled with the domain of the body, not born: Bacteria.

How do i solve this? Are there any good comprehensible resources for more information on R Regex formats?

+4

regex r parsing

hdavidzhu Aug 15 '14 at 18:00

source share

3 answers

.

       (.*);(.*)
             ^^^------- Marinilabiaceae

regex101 demo

, ,

             (.*?);(.*)
Bacteria -----^^^

+1

Braj 15 . '14 18:02

Extract everything after the last; to the end of the line you can use:

[^;]*?$

+1

Andie2302 Aug 15 '14 at 18:10

source share

Rich Scriven · Accepted Answer · 2014-08-15T18:03:42+0000

Starting from your simple line,

string <- "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Marinilabiaceae"

You can delete everything to the last semicolon "^(.*);"when calledsub

> sub("^(.*);", "", string)
# [1] "Marinilabiaceae"

You can also use strsplitwithtail

> tail(strsplit(string, ";")[[1]], 1)
# [1] "Marinilabiaceae"

([\\s\\S]*;) , \\s , . , regex101, pcre (php) (. "" ), R regex . R . , XML wiki .

How to format this search in Regex in R? It works great in an online test.

More articles: