R extract the first number from the string

I have a string in a variable that we call v1. This line contains image numbers and takes the form "Pic 27 + 28". I want to extract the first number and store it in a new variable called an element.

Some code I tried:

item <- unique(na.omit(as.numeric(unlist(strsplit(unlist(v1),"[^0-9]+"))))) 

This worked fine until I came to the list that went:

 [1,] "Pic 26 + 25" [2,] "Pic 27 + 28" [3,] "Pic 28 + 27" [4,] "Pic 29 + 30" [5,] "Pic 30 + 29" [6,] "Pic 31 + 32" 

At this moment, I get more numbers than I want, since it also captures other unique numbers (25).

I really tried to do this with gsub, but nothing worked. Help would be greatly appreciated!

+7
regex r gsub strsplit
source share
5 answers

I assume that you want to extract the first of two numbers on each line.

You can use the stri_extract_first_regex function in the stringi package:

 library(stringi) stri_extract_first_regex(c("Pic 26+25", "Pic 1,2,3", "no pics"), "[0-9]+") ## [1] "26" "1" NA 
+9
source share

In the answers below, we use this test data:

 # test data v1 <- c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27", "Pic 29 + 30", "Pic 30 + 29", "Pic 31 + 32") 

1) gsubfn

 library(gsubfn) strapply(v1, "(\\d+).*", as.numeric, simplify = c) ## [1] 26 27 28 29 30 31 

2) sub This does not require packages, but includes a slightly longer regular expression:

 as.numeric( sub("\\D*(\\d+).*", "\\1", v1) ) ## [1] 26 27 28 29 30 31 

3) read.table This does not contain regular expressions or packages:

 read.table(text = v1, fill = TRUE)[[2]] ## [1] 26 27 28 29 30 31 

In this particular example, fill=TRUE may be omitted, but may be required if the v1 components had a different number of fields.

+3
source share

To complete your strsplit attempt:

 # split the strings l <- strsplit(x = c("Pic 26 + 25", "Pic 27 + 28"), split = " ") l # [[1]] # [1] "Pic" "26" "+" "25" # # [[2]] # [1] "Pic" "27" "+" "28" # extract relevant part from each list element and convert to numeric as.numeric(lapply(l , `[`, 2)) # [1] 26 27 
+1
source share

You can do this very well with the first_number() function from the filesstrings package filesstrings or for more general needs, you can use the nth_number() function. Install it through install.packages("filesstrings") .

 library(filesstrings) #> Loading required package: stringr strings <- c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27", "Pic 29 + 30", "Pic 30 + 29", "Pic 31 + 32") first_number(strings) #> [1] 26 27 28 29 30 31 nth_number(strings, n = 1) #> [1] 26 27 28 29 30 31 
+1
source share

With str_extract from stringr :

 library(stringr) vec = c("Pic 26 + 25", "Pic 27 + 28", "Pic 28 + 27", "Pic 29 + 30", "Pic 30 + 29", "Pic 31 + 32") str_extract(v1, "[0-9]+") # [1] "26" "27" "28" "29" "30" "31" 
+1
source share

All Articles