What determines Kafka's consumer bias?

I am relatively new to Kafka. I have worked a bit with this, but some things are unclear to me regarding the consumer bias. From what I understood when the consumer starts, the offset that he will begin to read is determined by the auto.offset.reset configuration auto.offset.reset (correct me if I am wrong).

Now say, for example, that there are 10 messages in this thread (offsets from 0 to 9), and the consumer had to consume 5 of them before he fell (or before I killed the consumer). Then say I'm restarting this consumer process. My questions:

If auto.offset.reset set to smallest , does it always start to consume at offset 0?

If auto.offset.reset set to largest , will it start to consume at offset 5?

Is behavior against such a scenario always deterministic? Please feel free to comment if anything in my question is unclear. Thanks in advance.

+72
java distributed-computing apache-kafka
04 Sep '15 at 4:46
source share
3 answers

This is a little more complicated than you described. The configuration auto.offset.reset run ONLY if your consumer group does not have a valid offset fixed somewhere (the 2 supported offset stores are now Kafka and Zookeeper). And it also depends on which consumer you use.

If you are using a high-level Java consumer, then imagine the following scenarios:

  • You have a consumer in the group1 consumer group that consumed 5 messages and died. The next time you run this user, he will not even use this auto.offset.reset configuration and will continue to work from the place where she died, because she will simply select the saved offset from the offset store (Kafka or ZK, as I mentioned )

  • You have messages in the topic (as you described), and you start the user in the new group2 consumer group. There is no bias stored anywhere, and this time config auto.offset.reset will decide whether to start from the beginning of the topic ( smallest ) or from the end of the topic ( largest )

Another thing that affects what offset value will fit the smallest and largest configurations is the logging policy. Imagine you have a save theme configured for 1 hour. You create 5 messages, and then after another hour you send 5 more messages. The largest offset will still remain the same as in the previous example, but smallest cannot be 0 , because Kafka will delete these messages already, and thus the smallest available offset will be 5 .

All of the above is not related to SimpleConsumer , and every time you run it, it decides where to start using the auto.offset.reset configuration.

+126
04 Sep '15 at 7:12
source share

Just an update: from Kafka 0.9 onwards, Kafka uses the new version of the Java consumer, and the parameter names auto.offset.reset are changed; From the manual:

What to do if Kafka has no initial offset, or if the current offset no longer exists on the server (for example, since this data has been deleted):

earliest : automatically reset offset to earliest offset

last : automatically reset the offset to the last offset

none : raise an exception for the consumer if no previous bias is found for the consumer group

anything else: an exception to the consumer.

I took some time to find this after checking the answer above, so I thought it might be useful for the community to post it.

+29
Mar 09 '17 at 2:54 on
source share

Further offsets.retention.minutes are still there. If the time since the last commit → offsets.retention.minutes , then auto.offset.reset also kicks

0
Nov 02 '17 at 15:00
source share



All Articles