This is what happens. Stream always evaluated lazily, but already calculated elements are "cached" later. Lazy assessment is crucial. Take a look at this piece of code:
a = a.flatMap( v => Some( v ) )
Although it looks like you converted one Stream to another and discarded the old one, this is not what happens. The new Stream still maintains a link to the old. This is because the result of the Stream does not have to eagerly compute all the elements of the underlying stream, but do so on demand. Take this as an example:
io.Source.fromFile("very-large.file").getLines().toStream. map(_.trim). filter(_.contains("X")). map(_.substring(0, 10)). map(_.toUpperCase)
You can link as many operations as you want, but the file is barely touched to read the first line. Each subsequent operation simply terminates the previous Stream containing a link to the child stream. The moment you ask for size or do foreach , the evaluation begins.
Return to your code. In the second iteration, you create a third thread that contains a link to the second one, which takes turns linking to the one you originally defined. Basically you have a stack of fairly large objects.
But this does not explain why memory runs so fast. The most important part ... println() , or a.size , to be precise. Without printing (and thus evaluating all Stream ), the Stream remains "unappreciated." The invaluable stream does not cache any values, therefore it is very thin. Memory is still flowing due to the growing chain of flows into each other, but much, much slower.
This asks questions: why it works with toList It's pretty simple. List.map() eagerly creates a new List . Period. The previous one is no longer referenced and is not entitled to GC.
source share