How to eliminate duplicate entries in a stream based on Equal's own class

Question

How to eliminate duplicate entries in a stream based on Equal's own class

I have a similar problem, for example, described here . But with two differences, first I use stream api, and the second I already have the equals() and hashCode() method. But in the stream, the alignment of blogs in this context is not as defined in the Blog class.

 Collection<Blog> elements = x.stream() ... // a lot of filter and map stuff .peek(p -> sysout(p)) // a stream of Blog .? // how to remove duplicates - .distinct() doesn't work

I have a class with an equal method that allows you to call it ContextBlogEqual using the method

 public boolean equal(Blog a, Blog b);

Is there a way to remove all duplicate entries with my current streaming approach based on the ContextBlogEqual#equal method?

I already thought about grouping, but this also does not work, because the reason blogA and blogB equal is not only one parameter. Also, I have no idea how I can use .reduce (..), because there is more than one element on the left.

+5

java equality java-8 java-stream

Christian lutz Sep 03 '15 at 18:57

source share

3 answers

First, note that for most scenarios, the equal(Blog, Blog) method equal(Blog, Blog) not enough, because you need to match all posts that are inefficient. It is better to define a function that extracts a new key from a blog entry. For example, consider the following Blog class:

 static class Blog { final String name; final int id; final long time; public Blog(String name, int id, long time) { this.name = name; this.id = id; this.time = time; } @Override public int hashCode() { return Objects.hash(name, id, time); } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null || getClass() != obj.getClass()) return false; Blog other = (Blog) obj; return id == other.id && time == other.time && Objects.equals(name, other.name); } public String toString() { return name+":"+id+":"+time; } }

Let there be test data:

 List<Blog> blogs = Arrays.asList(new Blog("foo", 1, 1234), new Blog("bar", 2, 1345), new Blog("foo", 1, 1345), new Blog("bar", 2, 1345)); List<Blog> distinctBlogs = blogs.stream().distinct().collect(Collectors.toList()); System.out.println(distinctBlogs);

Here distinctBlogs contains three entries: [foo:1:1234, bar:2:1345, foo:1:1345] . Suppose this is undesirable because we do not want to compare the time field. The easiest way to create a new key is to use Arrays.asList :

 Function<Blog, Object> keyExtractor = b -> Arrays.asList(b.name, b.id);

The resulting keys already have the correct equals and hashCode .

Now, if you are fine with the terminal, you can create a custom collector as follows:

 List<Blog> distinctByNameId = blogs.stream().collect( Collectors.collectingAndThen(Collectors.toMap( keyExtractor, Function.identity(), (a, b) -> a, LinkedHashMap::new), map -> new ArrayList<>(map.values()))); System.out.println(distinctByNameId);

Here we use keyExtractor to generate the keys, and the merge function is (a, b) -> a , which means choosing a previously added record when a duplicate key appears. We use LinkedHashMap to maintain order (omit this option if you don't need order). Finally, we upload the map values to the new ArrayList . You can transfer such a collector creation to a separate method and generalize it:

 public static <T> Collector<T, ?, List<T>> distinctBy( Function<? super T, ?> keyExtractor) { return Collectors.collectingAndThen( Collectors.toMap(keyExtractor, Function.identity(), (a, b) -> a, LinkedHashMap::new), map -> new ArrayList<>(map.values())); }

Thus, the use will be easier:

 List<Blog> distinctByNameId = blogs.stream() .collect(distinctBy(b -> Arrays.asList(b.name, b.id)));

+1

Tagir valeev Sep 04 '15 at 3:28

source share

Essentially, you'll need a helper method like this:

 static <T, U> Stream<T> distinct( Stream<T> stream, Function<? super T, ? extends U> keyExtractor ) { final Map<U, String> seen = new ConcurrentHashMap<>(); return stream.filter(t -> seen.put(keyExtractor.apply(t), "") == null); }

Requires Stream and returns a new Stream that contains only different values specified in keyExtractor . Example:

 class O { final int i; O(int i) { this.i = i; } @Override public String toString() { return "O(" + i + ")"; } } distinct(Stream.of(new O(1), new O(1), new O(2)), o -> oi) .forEach(System.out::println);

This gives

 O(1) O(2)

Renouncement

As Tagir Valeev commented here and in this similar answer by Stuart Marx , this approach has flaws, The operation implemented here ...

unstable for ordered parallel flows
not optimal for sequential streams
violates predicate restriction without saving state on Stream.filter()

Wrapper above in your own library

You can, of course, extend Stream your own functionality and implement this new distinct() function there, for example. e.g. jOOλ or Javaslang do:

 Seq.of(new O(1), new O(1), new O(2)) .distinct(o -> oi) .forEach(System.out::println);

0

Lukas Eder Sep 05 '15 at 15:42

source share

Marko topolnik · Accepted Answer · 2015-09-03T20:04:10+0000

In essence, you need to either define a hashCode so that your data works with the hash table, or a full order to make it work with the binary search tree.

For hash tables, you need to declare a wrapper class that will override equals and hashCode .

For binary trees, you can define a Comparator<Blog> that takes into account your definition of equality and adds an arbitrary but consistent ordering criterion. Then you can compile in new TreeSet<Blog>(yourComparator) .

How to eliminate duplicate entries in a stream based on Equal's own class

Renouncement

Wrapper above in your own library

More articles: