How to make multiple synchronization logs in kafka?

Suppose I have 2 types of logs that have a common 'uid' field, and I want to display a log if the log of both of these two logs containing uid is, as a union, possible for Kafka

+1
apache-kafka apache-kafka-streams
source share
1 answer

Yes, absolutely. Check out Kafka Streams, in particular the DSL API. It looks something like this:

StreamsBuilder builder = new StreamsBuilder(); KStream<byte[], Foo> fooStream = builder.stream("foo"); KStream<byte[], Bar> barStream = builder.stream("bar"); fooStream.join(barStream, (foo, bar) -> { foo.baz = bar.baz; return foo; }, JoinWindows.of(1000)) .to("buzz"); 

This simple application uses two input themes ("foo" and "bar"), combines them and writes them to the "buzz" theme. Since the streams are infinite, when combining two streams you need to specify a connection window (1000 milliseconds above), which is the relative time difference between two messages in the respective streams to make them suitable for connecting.

Here is a more complete example: https://github.com/confluentinc/kafka-streams-examples/blob/4.0.0-post/src/main/java/io/confluent/examples/streams/PageViewRegionLambdaExample.java

And here is the documentation: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html . You will see that there are many different types of associations that you can perform:

It is important to note that although the above example will deterministically synchronize streams - if you reset and rework the topology, you will get the same result every time - not all merge operations in Kafka flows are deterministic. Starting from version 1.0.0 and earlier, approximately half are not deterministic and may depend on the order of data consumed from the main sections of the topic. In particular, the internal KStream - KStream and all KTable - KTable are deterministic. Other associations, like all KStream - KTable , and left / external KStream - KStream , are not deterministic and depend on the order of data consumption by consumers. Keep this in mind if you are designing your topology for processing. If you use these non-deterministic operations when your topology is working live, the order of events will give one result as they arrive, but if you rework your topology, you can get a different result. Note that operations such as KStream#merge() also do not produce deterministic results. For more on this issue, see Why My Kafka Streams Topology Doesn't Play / Recycle Correctly? and this is a mailing list

+5
source share

All Articles