R regex: grep excluding hyphen / dash as border

I am trying to combine the exact word in a vector with variable lines. For this, I use borders. However, I would like the hyphen / dash not to be considered a word boundary. Here is an example:

vector<-c(    
"ARNT",
"ACF, ASP, ACF64",
"BID",
"KTN1, KTN",
"NCRNA00181, A1BGAS, A1BG-AS",
"KTN1-AS1")

To match strings containing "KTN1", I use:

grep("(?i)(?=.*\\bKTN1\\b)", vector, perl=T) 

But this corresponds to both "KTN1" and "KTN1-AS1".

Is there a way to treat a dash as a character so that "KTN1-AS1" is considered a whole word?

+4
source share
2 answers

, , regmatches, str_extract_all ( stringr), grep, grep , .

> vector<-c(    
+     "ARNT",
+     "ACF, ASP, ACF64",
+     "BID",
+     "KTN1, KTN",
+     "NCRNA00181, A1BGAS, A1BG-AS",
+     "KTN1-AS1")
> regmatches(vector, regexpr("(?i)\\bKTN1[-\\w]*\\b", vector, perl=T))
[1] "KTN1"     "KTN1-AS1"

> library(stringr)
> unlist(str_extract_all(vector[grep("(?i)\\bKTN1[-\\w]*\\b", vector)], perl("(?i).*\\bKTN1[-\\w]*\\b")))
[1] "KTN1"     "KTN1-AS1"

Update:

> grep("\\bKTN1(?=$|,)", vector, perl=T, value=T)
[1] "KTN1, KTN"

, KTN1, .

> grep("\\bKTN1\\b(?!-)", vector, perl=T, value=T)
[1] "KTN1, KTN"

, KTN1, .

+4

.

grep('(^|[^-\\w])KTN1([^-\\w]|$)', vector, ignore.case = TRUE)

. , -— , \b.

+3

All Articles