How to merge an iterator parser

Question

How to merge an iterator parser

I have Iterator[(A1,B1)] and two functions

fA: (Iterator[A1]) => Iterator[A2] and
fB: (Iterator[B1]) => Iterator[B2] .

Is it possible to make fAB: (Iterator[(A1,B1)]) => Iterator[(A2,B2)] without converting iterators to Seq?

Edit

Both answers below are good. I chose @Aivean answer because the code is simpler and uses a specialized scala data structure (stream).

The only drawback is the stack limitation, but this should not be a problem for most use cases. If your iterator can be very (very) long, @Alexey is recommended.

+7

iterator scala

Juh_ Apr 14 '16 at 12:35

source share

2 answers

Not. Your hypothetical function should first call one of fA and fB . Let them say that he calls fA , and he asks for all A1 before doing anything. Then you do not have B1 left to go to fB unless you save them somewhere, potentially leaking memory. If acceptable, you can do:

 def unzip[A, B](iter: Iterator[(A, B)]) = { var qA = Queue.empty[A] var qB = Queue.empty[B] val iterA = new Iterator[A] { override def hasNext = qA.nonEmpty || iter.hasNext override def next() = qA.dequeueOption match { case Some((a, qA1)) => qA = qA1 a case None => val (a, b) = iter.next() qB = qB.enqueue(b) a } } // similar iterB (iterA, iterB) }

and then

 val (iterA, iterB) = unzip(iterator) fA(iterAfA).zip(fB(iterB))

(Well, you can also write iterator => fA(iterator.map(_._1)).zip(fB(iterator.map(_._2)) : it is of the correct type, but probably not the one you if you want, namely, it will use only one field of each tuple created by the original iterator and discard the other.)

+3

Alexey romanov Apr 14 '16 at 13:18

source share

Aivean · Accepted Answer · 2016-04-14T22:23:44+0000

I came up with a much simpler implementation:

 def iterUnzip[A1, B1, A2, B2](it: Iterator[(A1, B1)], fA: (Iterator[A1]) => Iterator[A2], fB: (Iterator[B1]) => Iterator[B2]) = it.toStream match { case s => fA(s.map(_._1).toIterator).zip(fB(s.map(_._2).toIterator)) }

The idea is to convert an iterator to a stream. Stream in Scala is lazy, but also provides memoization . This effectively provides the same buffering mechanism as @AlexeyRomanov’s solution, but more compressed. The only drawback is that Stream stores memoized elements on the stack as opposed to an explicit queue, so if fA and fB create elements at an uneven speed, you can get a StackOverflow exception.

Test that the score is lazy:

 val iter = Stream.from(0).map(x => (x, x + 1)) .map(x => {println("fetched: " + x); x}).take(5).toIterator iterUnzip( iter, (_:Iterator[Int]).flatMap(x => List(x, x)), (_:Iterator[Int]).map(_ + 1) ).toList

Result:

 fetched: (0,1) iter: Iterator[(Int, Int)] = non-empty iterator fetched: (1,2) fetched: (2,3) fetched: (3,4) fetched: (4,5) res0: List[(Int, Int)] = List((0,2), (0,3), (1,4), (1,5), (2,6))

I also tried hard enough to get a StackOverflow exception by creating uneven iterators, but couldn't.

 val iter = Stream.from(0).map(x => (x, x + 1)).take(10000000).toIterator iterUnzip( iter, (_:Iterator[Int]).flatMap(x => List.fill(1000000)(x)), (_:Iterator[Int]).map(_ + 1) ).size

Works great on -Xss5m and produces:

 res10: Int = 10000000

Thus, in general, this is a reasonably good and concise solution if you do not have any extreme cases.

How to merge an iterator parser

Edit

More articles: