What kind of magic does Flink use in different ()? How are surrogate keys generated?

Question

What kind of magic does Flink use in different ()? How are surrogate keys generated?

As for the generation of a surrogate key, the first step is to obtain a separate key, and then build an incremental key for each tuple.

Therefore, I use the Java Set to get individual elements and from the heap. Then I use Flink distinct () and it fully works.

May I ask what is the difference?

Another related question: can Flink generate a surrogate key in mapper?

+4

apache-flink

Akira Sendoh May 29 '15 at 9:38

source share

1 answer

Fabian Hueske · Accepted Answer · 2015-05-29T10:27:33+0000

Flink executes distinct()internally as GroupByfollowed by an operator ReduceGroup, where the reduction operator returns only the first element of the group.

GroupBy . , , , , . . GroupBy Sort Flink OutOfMemoryError.

, DataSet.distinct(KeySelector ks). MapFunction, .

What kind of magic does Flink use in different ()? How are surrogate keys generated?

More articles: