Why Collection <T> Implementation of Stream <T>?
This is a question about developing an API. When extension methods were added to C #, IEnumerable got all the methods that activated the use of lambda expressions directly in all Collections.
With the advent of lambdas and default methods in Java, I expect Collection to implement Stream and provide standard implementations for all of its methods. Thus, we will not need to call stream() to use the power that it provides.
What is the reason architects of the library have chosen a less convenient approach?
From the Lambda Frequently Asked Questions to Maurice Naftalin :
Why aren't Stream operations defined directly in the collection?
Early open API projects such as
filter,mapandreduceonCollectionorIterable. However, experience with this design has led to a more formal division of "streaming" methods into their own abstraction. Causes:
Collectionmethods, such asremoveAll, do in-place modifications, unlike new methods that are more functional in nature. Mixing two different methods with the same abstraction forces the user to keep track of which ones. For example, given an adCollection strings;two very similar calling methods
strings.removeAll(s -> s.length() == 0); strings.filter(s -> s.length() == 0); // not supported in the current APIwould have surprisingly different results; the first of them will remove all empty
Stringobjects from the collection, and the second will return a stream containing all non-emptyStrings, without affecting the collection.Instead, the current design ensures that only the explicitly received stream can be filtered:
strings.stream().filter(s.length() == 0)...;where the ellipsis is an additional stream operation ending in a final operation. This gives the reader a much clearer intuition about the effect of the filter;
With the lazy methods added to the
Collection, users were confused by the perceived but erroneous need to speculate whether the collection was in “lazy mode” or “standby mode”. Instead of burdening theCollectionwith new and different functions, a cleaner presentation ofStreamwith new functionality;The more methods added to
Collection, the greater the likelihood of name collisions with existing third-party implementations. By adding several methods (Stream,parallel), the probability of conflict is significantly reduced;View transformation is still needed to access the parallel view; the asymmetry between the serial and parallel representations of the flows was unnatural. Compare for example
coll.filter(...).map(...).reduce(...);from
coll.parallel().filter(...).map(...).reduce(...);This asymmetry would be especially evident in the API documentation, where
Collectionwould have many new methods for creating sequential threads, but only for creating parallel threads that would then have all the same methods asCollection. Factoring them into a separate interface,StreamOpssay, did not help; which will still contradict each other, must be implemented using bothStreamandCollection;A single view processing also leaves room for other additional looks in the future.
- A collection is an object model.
- Flow is a subject model.
Definition of a collection in a document :
A collection is a group of objects known as its elements.
Definition of flow in a document :
A sequence of elements supporting serial and parallel unit operations
In this case, the stream is a specific collection. Not this way. Therefore, Collection should not implement Stream, regardless of backward compatibility.
So why does Stream<T> implement Collection<T> ? Because this is another way to look at a bunch of objects. Not as a group of elements, but operations that you can perform on it. Thus, this is why I say that the collection is an object model, and Stream is an object model.
Firstly, from the Stream documentation:
Collections and streams that bear some surface similarities have different goals. Collections primarily relate to effective management and access to their elements. On the contrary, threads do not provide means for direct access or manipulation of their elements and are instead associated with a declarative description of their source and the computational operations that will be performed collectively on this source.
So you want to keep the flow concepts and the appart collection. If Collection will implement Stream , each collection will be a stream that is not conceptually. The way it is done now, each collection can give you a stream that works in this collection, something else if you think about it.
Another factor that comes to mind is traction / traction, as well as encapsulation. If every class implementing Collection had to perform Stream operations, it would have two (different) goals and might become too long.
I assume this was done in such a way as to avoid breaking with existing code that implements Collection. It would be difficult to provide a default implementation that would work correctly with all existing implementations.