Other languages seem to have similar questions, but I can't find them in R.
I have several text files in subdirectories of a directory; they all have an extension (.log) and they contain a mixture of text and data. I want to extract a couple of lines from these relatively large files.
For example, one file looks like this:
blahblahblah NUMBER OF CARTESIAN GAUSSIAN BASIS FUNCTIONS = 210 blahblahblah ----------------------------------------<br /> CPU timing information for all processes<br /> ========================================<br /> 0: 8853.469 + 133.948 = 8987.417<br /> 1: 8850.817 + 126.587 = 8977.405<br /> 2: 8851.925 + 128.576 = 8980.501<br /> 3: 8847.992 + 125.871 = 8973.864<br /> ----------------------------------------<br /> ddikick.x: exited gracefully.<br /> blahblahblah
I want to collect the number of basic functions (210 in this example) and the total number of processor times.
The line "NUMBER OF FUNCTIONS OF FUNCTIONS OF THE CARTUSIAN GAUSIAN BASES =" is unique for each file; those. if I open the file in a text editor and search using this line, I will return this only one line. Similarly for "processor synchronization information for all processes" and "gracefully exits."
I appreciate that it seems that I haven’t done much to help myself, but I just don’t know where to start. If someone can point me in the right direction, I hope I can fill in the rest.
After the help provided to me by @Ben (see below), here is the code I used,
filesearch <- function (x) { f <- readLines(x) cline <- grep("NUMBER OF CARTESIAN GAUSSIAN BASIS FUNCTIONS",f, value=TRUE) val <- as.numeric(str_extract(cline,"[0-9]+$")) coline <- grep("^ +CPU timing information", f) numstr <- sapply(str_extract_all(f[coline+2:5],"[0-9.]+"),as.numeric) cline1 <- sum(numstr[4,])/60 output <- c(val, cline1) return(cat(output,"\n")) }
I got this function and entered the key into the file that I need every time, and then transferred the two results to another file manually. Not as elegant as we would like, but it saved me a lot of time doing it this way. Thanks again @Ben.