Why donβt you just edit the headline and then read the rest in pieces? I donβt know how big this file is, but maybe in line blocks (I guessed 10000). Depending on how much memory you have, you can tune this to more or less.
##setup tf <- tempfile(); tf2 <- tempfile() write.csv(mtcars,tf) fr <- file(tf, open="rt") #open file connection to read fw <- file(tf2, open="wt") #open file connection to write header <- readLines(f,n=1) #read in header header <- gsub( 'disp' , 'newvar' , header) #modify header writeLines(header,con=fw) #write header to file while(length(body <- readLines(fr,n=10000)) > 0) { writeLines(body,fw) #pass rest of file in chunks of 10000 } close(fr);close(fw) #close connections #unlink(tf);unlink(tf2) #delete temporary files
It should be faster, because R will go through a while every 10,000 lines instead of every single line. In addition, R will call gsub only on the line you need, and not every line, saving R-time. R cannot edit the file "in place", so to speak, so there is no way to read and copy the file. If you need to do this in R, make your pieces the size of a memory, and then transfer your file.
I saw a 3 times performance difference between the two methods:
#test file creation ~3M lines tf <- tempfile(); tf2 <- tempfile() fw <- file(tf,open="wt") sapply(1:1e6,function(x) write.csv(mtcars,fw)) close(fw) #my way system.time({ fr <- file(tf, open="rt") #open file connection to read fw <- file(tf2, open="wt") #open file connection to write header <- readLines(f,n=1) #read in header header <- gsub( 'disp' , 'newvar' , header) #modify header writeLines(header,con=fw) #write header to file while(length(body <- readLines(fr,n=10000)) > 0) { writeLines(body,fw) #pass rest of file in chunks of 10000 } close(fr);close(fw) #close connections }) # user system elapsed # 32.96 1.69 34.85 #OP way system.time({ incon <- file( tf , "r" ) outcon <- file( tf2 , "w" ) while( length( one.line <- readLines( incon , 1 ) ) > 0 ){ one.line <- gsub( 'disp' , 'newvar' , one.line ) writeLines( one.line , outcon ) } close( incon ) ; close( outcon ) }) # user system elapsed # 104.36 1.92 107.03