Is there a way to read and write files in memory in R?

I'm trying to use R to analyze large DNA sequence files (fastq files, several gigabytes each), but the standard R-interface for these files (ShortRead) should immediately read the entire file. This does not fit into memory, so an error occurs. Is there a way in which I can read several (thousand) lines at a time, write them to a file in memory, and then use ShortRead to read from this file in memory?

I am looking for something like Perl IO :: Scalar, for R.

+7
memory-management file-io r large-files in-memory
source share
4 answers

Looks like ShortRead will soon add the class "FastqStreamer", which does what I want.

+2
source share

I don’t know much about R, but have you looked at the mmap package ?

+2
source share

Well, I do not know that readFastq accepts something other than a file ...

But if possible, for other functions you can use the pipe () R function to open a unix connection, then you can do this with a combination of unix and tail commands and some pipes.

For example, to get lines from 90 to 100, you use this:

head file.txt -n 100 | tail -n 10 

So you can just read the file in chunks.

If you need to, you can always use these unix utilities to create a temporary file, and then read it with shortRead. This is a pain, but if it can only accept a file, at least it works.

+1
source share

By the way, the answer to the question of how to make a file in memory in R (for example, Perl IO :: Scalar) is the textConnection function. However, unfortunately, the ShortRead package cannot process textConnection objects as input, therefore, although the idea that I expressed in the question of reading a file in small fragments into files in memory, which are then legibly legible, is certainly possible for many applications, but not for a specific application, because ShortRead does not like textConnections. Thus, the solution is the FastqStreamer class described above.

+1
source share

All Articles