How to edit or modify or modify one line in a large text file using R

I am reading some large text files in databases with R, but they contain invalid field names for the database software. the column names of large text files are only in the first line - is it possible to edit only this first line without looping through each line in the file (which seems like a waste of resources)?

Here are two examples of what I'm trying to do with some sample data. the first one reads everything in a ram - so this will not work for my large data tables. the second will work, but it is slow because it processes every line in the file.

I believe it is important that the solution works on different platforms and does not require the installation of external software (except for R-packages), simply because I will share this script with others and will not offer them more steps than necessary. I am looking for the fastest way to do this only inside R :)

# create two temporary files tf <- tempfile() ; tf2 <- tempfile() # write the mtcars data table to a file on the disk write.csv( mtcars , tf ) # look at the first three lines readLines( tf , n = 3 ) # read in the entire table z <- readLines( tf ) # make the only substitution i care about z[1] <- gsub( 'disp' , 'newvar' , z[1] ) # write the entire table back out to the table writeLines( z , tf2 ) # confirm the replacement readLines( tf2 , 2 ) # done! # # # # # # # OR # blank out the output file file.remove( tf2 ) # create a file connection to the text file incon <- file( tf , "r" ) # create a second file connection to the secondary temporary file outcon <- file( tf2 , "w" ) # read in one line at a time while( length( one.line <- readLines( incon , 1 ) ) > 0 ){ # make the substitution on every line one.line <- gsub( 'disp' , 'newvar' , one.line ) # write each line to the second temporary file writeLines( one.line , outcon ) } # close the connections close( incon ) ; close( outcon ) # confirm the replacement readLines( tf2 , 2 ) # done! 
+7
source share
3 answers

Why don’t you just edit the headline and then read the rest in pieces? I don’t know how big this file is, but maybe in line blocks (I guessed 10000). Depending on how much memory you have, you can tune this to more or less.

 ##setup tf <- tempfile(); tf2 <- tempfile() write.csv(mtcars,tf) fr <- file(tf, open="rt") #open file connection to read fw <- file(tf2, open="wt") #open file connection to write header <- readLines(f,n=1) #read in header header <- gsub( 'disp' , 'newvar' , header) #modify header writeLines(header,con=fw) #write header to file while(length(body <- readLines(fr,n=10000)) > 0) { writeLines(body,fw) #pass rest of file in chunks of 10000 } close(fr);close(fw) #close connections #unlink(tf);unlink(tf2) #delete temporary files 

It should be faster, because R will go through a while every 10,000 lines instead of every single line. In addition, R will call gsub only on the line you need, and not every line, saving R-time. R cannot edit the file "in place", so to speak, so there is no way to read and copy the file. If you need to do this in R, make your pieces the size of a memory, and then transfer your file.

I saw a 3 times performance difference between the two methods:

 #test file creation ~3M lines tf <- tempfile(); tf2 <- tempfile() fw <- file(tf,open="wt") sapply(1:1e6,function(x) write.csv(mtcars,fw)) close(fw) #my way system.time({ fr <- file(tf, open="rt") #open file connection to read fw <- file(tf2, open="wt") #open file connection to write header <- readLines(f,n=1) #read in header header <- gsub( 'disp' , 'newvar' , header) #modify header writeLines(header,con=fw) #write header to file while(length(body <- readLines(fr,n=10000)) > 0) { writeLines(body,fw) #pass rest of file in chunks of 10000 } close(fr);close(fw) #close connections }) # user system elapsed # 32.96 1.69 34.85 #OP way system.time({ incon <- file( tf , "r" ) outcon <- file( tf2 , "w" ) while( length( one.line <- readLines( incon , 1 ) ) > 0 ){ one.line <- gsub( 'disp' , 'newvar' , one.line ) writeLines( one.line , outcon ) } close( incon ) ; close( outcon ) }) # user system elapsed # 104.36 1.92 107.03 
+3
source

The wrong tool is used for this. Use some command line tool instead. For example. using sed , smth like sed -i '1 s/disp/newvar/' file should do. And if you need to do this in R, use

 filename = 'myfile' scan(pipe(paste("sed -i '1 s/disp/newvar/' ", filename, sep = ""))) 

Here's the Windows version:

 filename = 'myfile' tf1 = tempfile() tf2 = tempfile() # read header, modify and write to file header = readLines(filename, n = 1) header = gsub('disp', 'newvar', header) writeLines(header, tf1) # cut the rest of the file to a separate file scan(pipe(paste("more ", filename, " +1 > ", tf2))) # append the two bits together file.append(tf1, tf2) # tf1 now has what you want 
+5
source

You tried:

 iocon <- file("originalFile","r+") header <- readLines(iocon,n=1) header <- gsub('disp', 'newvar', header) writeLines(header, con=iocon) 

It just overwrites the first line, and depending on how it manages system resources, it can be very efficient. Make sure you have a backup.

-one
source

All Articles