This data frame df1looks really similar to what I work with in real life (two columns):
df1 <- data.frame(provider = c("LeBron James, MD",
"Peyton Manning, DDS",
"Mike Trout, DO"),
cpt_codes = c("This provider because he bills CPT codes 99284, 99282 and 99285 65% more than his peer group",
"Overutilization of visits per patient for E0781-RR-59 and J1100!",
"High units per patient compared to the specialty for the following:29581: 146.88% 93990: 33.71%"))
print(df1)
I need to extract all character blocks from a field cpt_codesthat are 5 (alphanumeric) characters in length and end with a number (0: 9). Then I need to map them to a field providercontaining a unique string for each combination of / cpt _code providers. The end result is as follows:
, stackoverflow R, . , , . , - "" .
library(stringr)
df1$cpt_codes <- str_replace_all(df1$cpt_codes, "[[:punct:]]", " ")
t <- str_extract_all(df1$cpt_codes, "\\b[a-zA-Z0-9]{5,5}\\b")
fn <- c(0:9)
cpts <- function(x) {
t1 <- subset(t[[x]], grepl(paste(fn, collapse = "|"), substr(t[[x]], 5, 5)) == TRUE)
data.frame(id = rep(x, length(t1)), cpt_codes = t1)
}
t2 <- do.call("rbind", (lapply(c(1:length(t)), function(x) cpts(x))))
df1$id <- c(1:nrow(df1))
df3 <- df1[, -2]
final <- merge(df3, t2, by = "id")
final[, -1]
print(final)