How to format this search in Regex in R? It works great in an online test.

In R, I have a data column in the data frame, and each element looks something like this:

Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Marinilabiaceae

What I want is the section after the last semicolon, and I'm trying to use "sub" as well as duplicate an existing column and create a new one with saved endings. In essence, I want this (kind):

Marinilabiaceae

The code snippet is as follows:

mydata$new_column<- sub("([\\s\\S]*;)", "", mydata$old_column)

In this situation, I use \\, and not \because of R escape sequences. subreplaces parts that I don't want and updates them to a new column. I tested Regex several times in places like: http://regex101.com/r/kS7fD8/1

However, I am still afraid, because the results are very strange. Now my new column is filled with the domain of the body, not born: Bacteria.

How do i solve this? Are there any good comprehensible resources for more information on R Regex formats?

+4
source share
3 answers

Starting from your simple line,

string <- "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Marinilabiaceae"

You can delete everything to the last semicolon "^(.*);"when calledsub

> sub("^(.*);", "", string)
# [1] "Marinilabiaceae"

You can also use strsplitwithtail

> tail(strsplit(string, ";")[[1]], 1)
# [1] "Marinilabiaceae"

([\\s\\S]*;) , \\s , . , regex101, pcre (php) (. "" ), R regex . R . , XML wiki .

+1

.

       (.*);(.*)
             ^^^------- Marinilabiaceae

regex101 demo

, ,

             (.*?);(.*)
Bacteria -----^^^

+1

Extract everything after the last; to the end of the line you can use:

[^;]*?$
+1
source

All Articles