You mentioned that you have only one processing, but then you worry about data loss. I assume that you are just worried about an extreme case when one of your servers goes down? And you are losing data?
I don’t think there is a way to make one message at a time. Looking through consumer configurations , it seems that it is possible to set the maximum bytes that a consumer can extract from Kafka, rather than the number of messages.
fetch.message.max.bytes
But if you are worried about a complete data loss, if you never commit an offset that Kafka will not mark, it will be fixed and it will not be lost. Read the Kafka documentation on delivery semantics ,
So effectively, Kafka guarantees at least a single delivery by default and allows the user to implement no more than one delivery by disconnecting, repeats the manufacturer’s attempt and sets off before processing the message packet. Exactly delivery requires collaboration with the target storage system, but Kafka provides an offset that makes it straightforward.
Thus, Kafka is not used by default to achieve accurate precision processing. This requires that you implement offset storage whenever you write the output of your processing to storage.
But it can be simpler and simpler, just the consumer saves his offset in the same place as his output ... As an example, our Hadoop ETL, which fills the data in HDFS, saves the offsets in HDFS with the data that it reads, to ensure that either the data or offsets are updated or not.
I hope this helps.
morganw09dev
source share