Keep only numbers until the FIRST hyphen AND the hyphen itself

I am trying to get rid of all the numbers / characters appearing after the first hyphen. Here are some examples:

15-103025-01 800-40170-02 68-4974-01 

My desired result:

 15- 800- 68- 

I read these posts:

But they are not what I am looking for, since the methods mentioned in them will also get rid of my hyphen (leaving only the first 2 or 3 numbers).

Here is what I have tried so far:

 gsub(pattern = '[0-9]*-$', replacement = "", x = data$id) grep(pattern = '[0-9]*-', replacement = "", x = data$id) regexpr(pattern = '[0-9]*-', text = data$id) 

but it doesn’t work as I expected.

+5
source share
3 answers

Several ways to achieve this, here is one of them:

 have <- c("15-103025-01", "800-40170-02", "68-4974-01") want <- sub(pattern = "(^\\d+\\-).*", replacement = "\\1", x = have) 

So, in your regex, you will have one group created with () 's that matches the beginning of a string ( ^ ), followed by one or more numbers ( \\d+ ) and a hyphen ( \\- ). Outside a group, any other character (s) that follows ( .* ).

In the replacement part, you specify \\1 to refer to the first (and only) regex group. Not adding anything else means dumping everything else.

+6
source

Why not just

 sub('-.*', '-', x) #[1] "15-" "800-" "68-" 

To do the same with the second def,

 sub('-([^-]*)$', '-', x) #[1] "15-103025-" "800-40170-" "68-4974-" 
+3
source

Alternative with stringr, presumably the name of the vector x

 library(stringr) str_sub(x,1,str_locate(x,"-")[ ,1]) 

this part takes a vector of strings as an argument, returns the position of the matching pattern in this case, β€œ-” in the string

 str_locate(x,"-") 

Thus, this code will return a matrix of start and end positions, which in this case are the same numbers, because "-" is only one character, starting and ending at the same position

  start end [1,] 3 3 [2,] 4 4 [3,] 3 3 

When we multiply this path

 str_locate(x,"-")[ ,1] 

we get

 [1] 3 4 3 

and now the str_sub function receives a substring of the entire string, where we indicate the start and end position of the substring. Thus, basically it is read in the same way as for all elements of the vector x, creating a substring starting with character 1 and ending at the position of the first dash, which is calculated as shown earlier.

 str_sub(x,1,str_locate(x,"-")[ ,1]) 
+2
source

All Articles