With R, combine text from a variable number of lines into one text element

What R code will combine the descriptive entries for each person in the fake data frame below into one variable? The data comes from an Excel spreadsheet, where descriptive records for a record can have 1 to 8 rows. Each timekeeper record ends with an empty line.

Assume this data frame with dput () as follows:

> df
        timekeeper                                       narrative
1         Person A     Review and revise insert for audit response
2  Invoice=2858502 letter regarding separate investigation; review
3             <NA>            and exchange messages regarding same
4             <NA>                                            <NA>
5         Person B   Telephone conference with team; review e-mail
6  Invoice=2835951                 correspondence from X regarding
7             <NA>     credentialing issues; e-mail correspondence
8             <NA>               with Y regarding same; review and
9             <NA> approve transmittal letter for incident reports
10            <NA>                                            <NA>
11        Person C                  Telephone conference with X, Y
12 Invoice=2835951                     et al., regarding notice of
13            <NA>                                            <NA>
14        Person D                       Telephone conference with
15 Invoice=2835951    Brady, Gibson, et al., regarding DAB status;
16            <NA>            telephone conference with X, et al.,
17            <NA>    regarding physician investigation at 123 and
18            <NA>          medical liability insurance; telephone
19            <NA>                                            <NA>
20        Person B                   Conference with B regarding D
21 Invoice=2835951                                            <NA>

structure(list(timekeeper = c("Person A", "Invoice=2858502", 
NA, NA, "Person B", "Invoice=2835951", NA, NA, NA, NA, "Person C", 
"Invoice=2835951", NA, "Person D", "Invoice=2835951", NA, NA, 
NA, NA, "Person B", "Invoice=2835951"), narrative = c("Review and revise insert for audit response", 
"letter regarding separate investigation; review", "and exchange messages regarding same", 
NA, "Telephone conference with team; review e-mail", "correspondence from X regarding", 
"credentialing issues; e-mail correspondence", "with Y regarding same; review and", 
"approve transmittal letter for incident reports", NA, "Telephone conference with X, Y", 
"et al., regarding notice of", NA, "Telephone conference with", 
"Brady, Gibson, et al., regarding DAB status;", "telephone conference with X, et al.,", 
"regarding physician investigation at 123 and", "medical liability insurance; telephone", 
NA, "Conference with B regarding D", NA)), .Names = c("timekeeper", 
"narrative"), row.names = c(NA, -21L), class = "data.frame")

I would like this format:

timekeeper  combined narrative
Person A    Review and revise insert for audit response letter regarding separate investigation; review and exchange messages regarding same

A possible solution may be in this SO question, but my situation with empty lines and variable-length narratives scans me. multiple lines combined

+4
source share
2 answers

Base R Approach:

indx <- grep('Person', df$timekeeper)
vec <- logical(nrow(df))
vec[indx] <- T
lst <- lapply(split(df$narrative, cumsum(vec)), paste, collapse= ' ')
names(lst) <- df$timekeeper[indx]
newdf <- as.data.frame(lst)
t(newdf)
#           [,1]                                                                                                                                                                                                            
#Person.A   "Review and revise insert for audit response letter regarding #separate investigation; review and exchange messages regarding same NA"                                                                           
#Person.B   "Telephone conference with team; review e-mail correspondence from X #regarding cred
+1
source
library(data.table)
library(zoo)   
#step 1: convert all timekeeper matching the invoice pattern to NA
#step 2: using `na.locf` from zoo package, fill in NA in timekeeper with most recent non-NA value
#step 3: collpase non-NA narrative by timekeeper

 setDT(df1)[,timekeeper:=na.locf(sub("(Invoice\\=\\d+)",NA,timekeeper))][,.(narrative=paste(narrative[!is.na(narrative)],collapse=" ")),by='timekeeper']

timekeeper
1:   Person A
2:   Person B
3:   Person C
4:   Person D
                                                                                                                                                                                                                                   narrative
1:                                                                                                          Review and revise insert for audit response letter regarding separate investigation; review and exchange messages regarding same
2: Telephone conference with team; review e-mail correspondence from X regarding credentialing issues; e-mail correspondence with Y regarding same; review and approve transmittal letter for incident reports Conference with B regarding D
3:                                                                                                                                                                                Telephone conference with X, Y et al., regarding notice of
4:                                           Telephone conference with Brady, Gibson, et al., regarding DAB status; telephone conference with X, et al., regarding physician investigation at 123 and medical liability insurance; telephone
+4
source

All Articles