R regex gets text between single quotes

Question

R regex gets text between single quotes

I have text like

la<-c("case when ANTIG_CLIENTE <= 4 then '01: ANTIG_CLIENTE <= 4' when ANTIG_CLIENTE <= 8 then '02: ANTIG_CLIENTE <= 8' else '99: Error' end ")

I want to extract text between single quotes as a list:

 "01: ANTIG_CLIENTE <= 4","02: ANTIG_CLIENTE <= 8","99: Error"

I tried two approaches without success

 > sub('[^\]+\"([^\']+).*', '\\1', la) Error: '\]' is an unrecognized escape in character string starting "'[^\]" > regmatches(x, gregexpr('"[^']*"', la))[[1]] Error: unexpected ']' in "regmatches(x, gregexpr('"[^']"

How can I get text between single quotes?

+4

regex r

Oscar Benitez Aug 2 '15 at 23:32

source share

1 answer

MichaelChirico · Accepted Answer · 2015-08-02T23:41:07+0000

This should get what you want. The only assumption is that all the lines you want to use for single quotes contain a colon (otherwise, how should we distinguish between '01: ANTIG_CLIENTE <= 4' from ' when ANTIG_CLIENTE <= 8 then ' , both of which are enclosed in single quotes?):

 > regmatches(la,gregexpr("'[^']*:[^']*'",la)) [[1]] [1] "'01: ANTIG_CLIENTE <= 4'" "'02: ANTIG_CLIENTE <= 8'" "'99: Error'"

Basically, we are trying to return all expressions (hence gregexpr instead of regexpr ) of the form of a single quote, something other than a single quote, a colon, something other than a single quote, a single quote.

If you want to exclude single quotes in what is returned, you will need look-ahead and look-behind, which requires R to interpret your regular expression as perl:

 > regmatches(la,gregexpr("(?<=')[^']*:[^']*(?=')",la,perl=T)) [[1]] [1] "01: ANTIG_CLIENTE <= 4" "02: ANTIG_CLIENTE <= 8" "99: Error"

R regex gets text between single quotes

More articles: