Java Streams - Filtering Previously Filtered Values

Question

Java Streams - Filtering Previously Filtered Values

I am experimenting with Java threads and trying to figure out what is possible, as well as their strengths and weaknesses. I am currently trying to implement the Sieve of Eratosthenes using a stream, but I cannot find a good way to iterate over previously filtered values without storing them in a separate collection.

I want to do something like this:

IntStream myStream = IntStream.range(0,3); myStream.filter(s -> { System.out.print("[filtering "+s+"] "); myStream.forEach(q -> System.out.print(q+", ")); System.out.println(); return true; //eventually respond to values observed on the line above });

With the desired output:

 [filtering 0] [filtering 1] 0, [filtering 2] 0, 1, [filtering 3] 0, 1, 2,

Note that when filtering each new value, all previously filtered values are observed. This would make it easy to implement the Sieve of Eratosthenes, because I could filter all non-primary values and for each new check the values for divisibility by all numbers that previously transmitted the primary filter.

However, the above example gives me an error in NetBeans:

 local variables referenced from a lambda expression must be final or effectively final

This is similar to the fact that I'm referring to myStream inside a filter that already acts on myStream. Is there a good way to get around this error (i.e., make a final copy of the stream containing only those values that have been filtered so far), or is there a better approach to this problem without using a separate collection to store the value?

+7

java lambda filter java-8 java-stream

Porthos3 Aug 19 '15 at 17:31

source share

4 answers

Helder pereira · Answer 1 · 2015-08-19T23:57:02+0000

I managed to create an infinite Stream primes using Sieve of Eratosthenes, but in fact it does not use past values. Instead, it removes the multiple bars in the tail (in a lazy way, because the tail is infinite), like the original sieve of Eratosthenes algorithm. For this, I used Iterator as a helper (since Stream can be used only once) and implemented lazyConcat for streams.

 class StreamUtils { public static IntStream fromIterator(PrimitiveIterator.OfInt it) { return StreamSupport.intStream( Spliterators.spliteratorUnknownSize(it, Spliterator.ORDERED), false); } public static IntStream lazyConcat(Supplier<IntStream> a, Supplier<IntStream> b) { return StreamSupport.intStream(new Spliterator.OfInt() { boolean beforeSplit = true; Spliterator.OfInt spliterator; @Override public OfInt trySplit() { return null; } @Override public long estimateSize() { return Long.MAX_VALUE; } @Override public int characteristics() { return Spliterator.ORDERED; } @Override public boolean tryAdvance(IntConsumer action) { boolean hasNext; if (spliterator == null) { spliterator = a.get().spliterator(); } hasNext = spliterator.tryAdvance(action); if (!hasNext && beforeSplit) { beforeSplit = false; spliterator = b.get().spliterator(); hasNext = spliterator.tryAdvance(action); } return hasNext; } }, false); } }

My Eratosthenes flow sieve is as follows:

 class Primes { public static IntStream stream() { return sieve(IntStream.iterate(2, n -> n + 1)); } private static IntStream sieve(IntStream s) { PrimitiveIterator.OfInt it = s.iterator(); int head = it.nextInt(); IntStream tail = StreamUtils.fromIterator(it); return StreamUtils.lazyConcat( () -> IntStream.of(head), () -> sieve(tail.filter(n -> n % head != 0))); } }

Then we can use it as follows:

 System.out.println(Primes.stream().limit(20).boxed().collect(Collectors.toList()));

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

I think it was a good exercise, but it seems to be pretty inefficient and not compatible with the stack.

Eran · Answer 2 · 2015-08-19T17:38:25+0000

You cannot process Stream more than once, so calling myStream.forEach inside the filter method is not possible.

You can create a new IntStream inside the filter.

Please note that you will need to add some terminal operation to the external Stream pipeline so that it is processed:

 IntStream myStream = IntStream.range(0,4); myStream.filter(s -> { System.out.print("[filtering "+s+"] "); IntStream.range(0,s).forEach(q -> System.out.print(q+", ")); System.out.println(); return true; //eventually respond to values observed on the line above }).forEach(i->{});

This gives:

 [filtering 0] [filtering 1] 0, [filtering 2] 0, 1, [filtering 3] 0, 1, 2,

zeroflagL · Answer 3 · 2015-08-20T21:28:46+0000

It is debatable if a thread is the right tool here, but .filter() definitely not. Filters should be stateless, so the idea should not arise in the first place. Based on the example of your answer, a collector might be an acceptable solution.

 List<Integer> primes = IntStream.range(2, UPPER_BOUND) .collect(ArrayList::new, (list, number) -> { for(int j=0; j < list.size(); j++) { int prime = list.get(j); if(prime > Math.sqrt(number)) { break; } if(number % prime == 0) { return; } } list.add(number); }, List::addAll);

ArrayList::new creates a new list, which is then referenced by the consumer as list . The consumer is called for each item in the stream with number being the item.

List::addAll will only be relevant for parallel threads, which in any case cannot be used for this algorithm.

Porthos3 · Answer 4 · 2015-08-19T20:59:52+0000

Other answers suggested that the approach I tried was not possible and that a separate collection needs to be used.

To provide a more complete answer, I would like to provide a valid approach to this problem using threads and compare it with a more traditional approach.

Listing of primes using streams (using a sieve of eratosthenes):

 List<Integer> primes = new ArrayList<Integer>(); IntStream.iterate(2, i -> i + 1) .limit(UPPER_BOUND) .filter(i -> { for(int j=0; j<primes.size(); j++) { int prime = primes.get(j); if(prime > Math.sqrt(i)) { break; } if(i % prime == 0) { return false; } } return true; }) .forEach(primes::add);

The traditional, equivalent, threadless approach:

 List<Integer> primes = new ArrayList<Integer>(); for(int i=2; i < UPPER_BOUND; i++) { boolean isPrime = true; for(int j=0; j<primes.size(); j++) { int prime = primes.get(j); if(prime > Math.sqrt(i)) { break; } if(i % prime == 0) { isPrime = false; break; } } if(isPrime) { primes.add(i); } }

Performance Comparison:

Some experiments with each function have consistently demonstrated that the traditional approach is actually faster than using threads in this case. The thread approach was sequentially performed 1.5 times longer in order to find all primes less than one million compared to the traditional approach (on average 106 ms and 70 ms, respectively, on my machine).

This performance difference can probably be easily implemented if the stream.parallel () function can easily parallelize the problem. However, in this case, parallelization is not easy, since ArrayList is not thread safe and quickly leads to errors and / or inaccurate results.

Output:

Assuming the other answers are correct, filtering already filtered data inside the filter in the same stream is not possible in Java.

Listing primes can be resolved using streams. However, expecting a better solution than my own, it’s better to stick to the traditional, unrated approach now.

Java Streams - Filtering Previously Filtered Values

More articles: