Write a gzip file from a data frame

I am trying to write a data frame to a gzip file, but with problems.

Here is an example of my code:

df1 <- data.frame(id = seq(1,10,1), var1 = runif(10), var2 = runif(10)) gz1 <- gzfile("df1.gz","w" ) writeLines(df1) 

Error in writeLines(df1) : invalid text argument

Any suggestions?

EDIT: a line of an example character vector I'm trying to write:

 0 | var1:1.5 var2:.55 var7:1250 

The class label / y-variable is separated from x-vars by the character "|", and the variable names are separated from the ":" values ​​and spaces between the variables.

EDIT2: We apologize for the wording / format of the question, but here are the results: Old method:

 system.time(write(out1, file="out1.txt")) # user system elapsed # 9.772 17.205 86.860 

New Method:

 writeGzFile <- function(){ gz1 = gzfile("df1.gz","w"); write(out1, gz1); close(gz1) } system.time( writeGzFile()) # user system elapsed # 2.312 0.000 2.478 

Thanks to everyone for helping me figure this out.

+9
source share
4 answers

writeLines expects a list of strings. The easiest way to write this to a gzip file would be

 df1 <- data.frame(id = seq(1,10,1), var1 = runif(10), var2 = runif(10)) gz1 <- gzfile("df1.gz", "w") write.csv(df1, gz1) close(gz1) 

This will write it as gsipped csv. Also see write.table and write.csv2 for alternative ways of writing a file.

EDIT: Based on the updates to the message about the desired format, I made the following helper (quickly dumped together probably allows tons of simplification):

 function(df) { rowCount <- nrow(df) dfNames <- names(df) dfNamesIndex <- length(dfNames) sapply(1:rowCount, function(rowIndex) { paste(rowIndex, '|', paste(sapply(1:dfNamesIndex, function(element) { c(dfNames[element], ':', df[rowIndex, element]) }), collapse=' ') ) }) } 

So the result looks like

 a <- data.frame(x=1:10,y=rnorm(10)) writeLines(myser(a)) # 1 | x : 1 y : -0.231340933021948 # 2 | x : 2 y : 0.896777389870928 # 3 | x : 3 y : -0.434875004781075 # 4 | x : 4 y : -0.0269824962632977 # 5 | x : 5 y : 0.67654540494899 # 6 | x : 6 y : -1.96965253674725 # 7 | x : 7 y : 0.0863177759402661 # 8 | x : 8 y : -0.130116466571162 # 9 | x : 9 y : 0.418337557610229 # 10 | x : 10 y : -1.22890714891874 

And all that is needed is to pass gzfile to writeLines to get the desired result.

+19
source

To write something to the gzip file, you need to β€œserialize” it to text. For R objects, you can get hit using dput :

 gz1 = gzfile("df1.gz","w") dput(df1, gz1) close(gz1) 

However, you just wrote a textual representation of the data frame to a file. This will most likely be less efficient than using save(df1,file="df1.RData") to save it in the native R. data file. Ask yourself: why am I saving it as a .gz file?

In a quick check with some random numbers, the gz file was 54k, the .RData file was 34k

+4
source

Another very easy way to do this:

 # We create the .csv file write.csv(df1, "df1.csv") # We compress it deleting the .csv system("gzip df1.csv") 

Got an idea from: http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html

+3
source

You can use the gzip function in R.utils:

 library(R.utils) library(data.table) #Write gzip file df <- data.table(var1='Compress me',var2=', please!') fwrite(df,'filename.csv',sep=',') gzip('filename.csv',destname='filename.csv.gz')' #Read gzip file fread('gzip -dc filename.csv.gz') var1 var2 1: Compress me , please! 
+1
source

All Articles