Apache Kafka: playing threads in a thread

Question

Apache Kafka: playing threads in a thread

I am considering using Apache Kafka as an event repository for storing events within a microservice.

One thing I read on various blogs is that Kafka can be considered the only source of truth where Kafka magazine will store all the events for this topic.

I was wondering if Kafka will be able to play messages from the very beginning (in the event of a hard drive / network failure that occurs, for example)?

(note that I see that there are some logs stored in the / tmp folder in the themes directory). Does anyone know any command (if any) that can be called to play back messages in a thread?

+9

apache-kafka spring-kafka

rm12345 Jan 28 '18 at 1:23

source share

3 answers

Gary russell · Answer 1 · 2018-01-28T03:33:09+0000

Yes, you can search for a specific offset, but

the beginning of time

Depends on the theme or configuration of the broker. IIRC, the default retention period is 7 days.

Refer to the Kafka documentation .

Dmitry Minkovsky · Answer 2 · 2018-01-28T03:45:18+0000

Depending on your use case, you can set your event topics to never end. There are several considerations here :

1) you can process the source stream from a given Kafka theme only within the storage period of the original theme. For example, if an event occurs and the source is released into the topic at time t0, and this period of topic storage t1, then this event can be processed to t0 + t1.
2) if the messages of your input are not independent, i.e. this is not a write-only stream for applications, but messages logically depend on each other (you can read about it at https://kafka.apache.org/0110/documentation/streams/developer-guide#streams_kstream_ktable ) recycle it at ANY given point in the story. For example, if the input source stream is actually a stream similar to a variable, so that the user creation event occurs at t0, and the user update event occurs at t2, then if you process at time t1, where t0 <t1 < t2, you restart in the "inconsistent" state of the original thread. So, the correct way is to configure the input source theme so that you do not use temporary preservation, but use a keyword-based compression policy.

As the saying goes, if your events are independent, then data processing is quite simple. You may even have expired old segments. But if your messages are interdependent, you will not be able to obsolete old records. For more information about withholding, see this mailing .

How you write the processing logic, and the engine that you use to reprogram it is also important to consider. Suppose you use Kafka threads to handle events in the state you are requesting. If you want to be able to process events to restore this state, in addition to being able to process events in real time during the event, your stream processing engine should be able to coordinate the repetition of historical events during processing so that they are synchronized as during live processing, so your processing logic handles these events the same way. Kafka Streams currently has limited capabilities in this department , so you need to write a topology that does this coordination. Apache Beam doesn't seem to offer much in this department. The blog post mentions :

A common characteristic of archived data is that it can fail out of order. Shards of archived files often lead to a completely different processing order than events arriving in near real time. All data will also be available and therefore will be delivered instantly from the perspective of your pipeline. Regardless of whether you are experimenting with past data or processing past results to correct a data processing error, it is critical that your processing logic be applied to archived events as easily as almost real-time data.

... but the actual ray documentation doesn't mention time or historical processing at all. Therefore, I believe that the emphasis in this paragraph should be on " that it is critical that your processing logic is as easy to apply to archived events as real-time incoming data. "

Thus, it makes sense to consider historical processing as a general case and real-time processing as a special case. To write the logic applicable to both processing modes:

You should know which operations in your stream processing processor give deterministic results in both historical and live processing modes, and which are not. For example, with Kafka threads, not all operations give the same results in both real and historical processing .
You may want your processing logic not to depend on runtime factors such as System.currentTimeMillis() , or to generate random identifiers that will not be the same for each run (unless, of course, you want them to be different) .

Vishal akalkote · Answer 3 · 2019-08-22T13:51:09+0000

Yes, you can repeat the message. As a consumer, it has control over bias reset. You can start reading messages from the very beginning or, if you know any existing offset value, you can read it there too. As soon as the message is fixed, it will be there in the subject, until it expires. The default retention period is 7 days, however you can change it at any time.

Apache Kafka: playing threads in a thread

More articles: