Reading a large file in functional scala

I am trying to process a large binary with scala. If possible, I would like to use a functional approach. My main method for doing this is as follows:

def getFromBis( buffer:List[Byte], bis:BufferedInputStream ):(Byte,List[Byte],Boolean) = { buffer match { case Nil => val buffer2 = new Array[Byte](100000) bis.read(buffer2) match { case -1 => (-1,Nil,false) case _ => val buffer3 = buffer2.toList (buffer3.head,buffer3.tail,true) } case b::tail => return (b,tail,true) } } 

A list buffer and a buffer input stream are required. If the buffer is not empty, it simply returns the head and tail; if it is empty, it receives the next fragment from the file and uses it as a buffer instead.

As you can see, this is not very functional. I am trying to do this in such a way that there are as few basic io calls as possible, so I do it in order. The problem here is in the new array. Each time I run a function, it creates a new array, and judging by the constantly increasing memory usage as the program starts, I do not think that they will be destroyed.

My question is this: is there a better way to read a large file using scala? I would like to take a fully functional approach, but at least I need a function that could act as a black box for the rest of my functional program.

+4
source share
1 answer

You almost certainly don't want to store bytes in a List . For each byte you need a new object. This is really inefficient and will probably cause 20x memory usage than you need.

The easiest way to do this is to create an iterator that stores the internal state:

 class BisReader(bis: BufferedInputStream) { val buffer = new Array[Byte](100000) var n = 0 var i = 0 def hasNext: Boolean = (i < n) || (n >= 0 && { n = bis.read(buffer) i = 0 hasNext }) def next: Byte = { if (i < n) { val b = buffer(i) i += 1 b } else if (hasNext) next else throw new IOException("Input stream empty") } } implicit def reader_as_iterator(br: BisReader) = new Iterator[Byte] { def hasNext = br.hasNext def next = br.next } 

Perhaps BisReader extends Iterator [Byte], but since Iterator is not specialized, this will require a box for the next / hasNext access. This way you can access the low level (next / hasNext) at full speed when you need it, and use convenient iterator methods otherwise.

Now you have isolated your ugly non-functional Java IO stuff in one class with a clean interface and you can return to functionality.


Edit: except, of course, IO is order-dependent and has side effects, but the previous method also does.

+6
source

All Articles