Java 8 Stream vs Collection Storage

I was looking at Java 8 streams and the way data is transferred from the data source, rather than collecting the entire collection to retrieve the data.

In this quote, I read an article on threads in Java 8.

"There is no storage. Streams do not have storage for values, they carry values โ€‹โ€‹from a source (which can be a data structure, a generation function, an I / O channel, etc.) through a pipeline of computational steps." From source: http://www.drdobbs.com/jvm/lambdas-and-streams-in-java-8-libraries/240166818?pgno=1

I understand the concept of streaming data from source in parts. What I don't understand is streaming from a collection, how is there no storage? The collection already exists on the heap, you simply transfer data from this collection, the collection already exists in the "storage".

What difference in footprint memory would be reasonable if I just skipped a collection with a standard for loop?

+8
java collections memory java-8 java-stream
Apr 22 '15 at 4:18
source share
5 answers

A statement about threads and storage means that the stream does not have its own storage. If the source of the stream is a collection, then it is obvious that the collection has storage for storing items.

Take one example from this article:

int sum = shapes.stream() .filter(s -> s.getColor() == BLUE) .mapToInt(s -> s.getWeight()) .sum(); 

Suppose shapes is a Collection that has millions of elements. It can be assumed that the filter operation will iterate over the elements from the source and create a temporary result set that can also have millions of elements. The mapToInt operation can then mapToInt over this temporary collection and generate its results for summation.

This is not how it works. There is no temporary, intermediate collection. The operations of the stream are pipelined, so the elements that exit the filter are passed through mapToInt and from here to sum without saving and reading from the collection.

If the stream source was not a collection โ€” say, elements were read from a network collection โ€” there should be no storage at all. The conveyor is as follows:

 int sum = streamShapesFromNetwork() .filter(s -> s.getColor() == BLUE) .mapToInt(s -> s.getWeight()) .sum(); 

can handle millions of items, but it doesn't need to store millions of items anywhere.

+23
Apr 22 '15 at 17:36
source share

Think of the flow as a nozzle connected to the water tank, which is your data structure. The nozzle does not have its own storage. Of course, the water (data) that the stream provides comes from a source that has storage, but the stream itself does not have storage. Connecting another nozzle (stream) to your tank (data structure) will not require storage for an entire new copy of the data.

+3
Apr 22 '15 at 4:23
source share

A stream is just a representation of the data, it does not have its own storage, and you cannot modify the underlying collection (assuming it is a stream that was built on top of the collection) through the stream. This is like read-only access.

If you have experience working with RDBMSs, this is the same concept of "view."

+3
Apr 22 '15 at 4:27
source share
  • A collection is a data structure. Based on the problem, you decide which collection will be used as ArrayList, LinekedList (given the complexity of time and space). Where Stream is just a processing tool that makes your life easy.

  • Another difference: you can consider Collection as a data structure in memory, where you can add, remove an element. Where, as in Stream , you can perform two operations:

    but. Intermediate operation . Filter, display, sorting, result restriction. b. Terminal operation : forEach, collects the result set into a collection.

    But if you notice that with the stream you cannot add or remove elements.

  • Stream is an iterator, you can move through the stream. Please note: you can only move the thread once, let me give an example to better understand:

Example 1:

 List<String> employeeNameList = Arrays.asList("John","Peter","Sachin"); Stream<String> s = employeeNameList.stream(); // iterate through list s.dorEach(System.out :: println); // this work perfectly fine s.dorEach(System.out :: println); // you will get IllegalStateException, stating stream already operated upon 

So, you can conclude that you can iterate as many times as you want. But for the flow, as soon as you repeat, it does not remember what it should do. So, you need to instruct him again.

I hope this is clear.

0
Jan 26 '16 at 12:50
source share

The previous answer is mostly correct. However, the still more intuitive answer (for Google passengers landing here):

Think of streams as text pipelines on UNIX: cat input.file | gray ... | grep ...> output.file

Typically, these UNIX text utilities consume a small amount of RAM compared to processed input.

It is not always so. Think about sorting. This algorithm should store intermediate data in memory. The same is true for threads. Sometimes temporary data will be needed. In most cases this is not the case.

As an additional comparison, to some extent, the โ€œAPI without a cloud serverโ€ follows the same UNIX pipeline o Designing a Java thread. They do not exist in memory until some input data has been processed. The cloud-based OS will launch them and enter the input data. The output is sent gradually somewhere else, so the cloud-serverless API does not consume a lot of resources (in most cases).

Not absolute "truths" in this case.

0
Jul 22 '19 at 17:50
source share



All Articles