What kind of magic does Flink use in different ()? How are surrogate keys generated?

As for the generation of a surrogate key, the first step is to obtain a separate key, and then build an incremental key for each tuple.

Therefore, I use the Java Set to get individual elements and from the heap. Then I use Flink distinct () and it fully works.

May I ask what is the difference?

Another related question: can Flink generate a surrogate key in mapper?

+4
source share
1 answer

Flink executes distinct()internally as GroupByfollowed by an operator ReduceGroup, where the reduction operator returns only the first element of the group.

GroupBy . , , , , . . GroupBy Sort Flink OutOfMemoryError.

, DataSet.distinct(KeySelector ks). MapFunction, .

+5

All Articles