Themes, sections, and keys

I am looking for some clarification on this. In Kafka docs, I found the following:

Kafka provides only a general order over messages within a section, and not between different sections in a topic. For most applications, partitioning at the partition level is sufficient, combined with the ability to split data by key. However, if you need a general message order, this can be achieved with a topic that has only one section, although this will only mean one consumer process for each consumer group.

So here are my questions:

  • Does this mean that if I want to have more than one consumer (from the same group) that is read from the same topic, I need to have more than 1 section?

  • Does this mean that I need as many sections as the number of consumers for the same group?

  • How many users can read from one section?

There are also some questions regarding the relationship between keys and sections regarding the API. I just looked at the .net APIs (especially one from MS), but it looks like a similar Java API. I see that when using the manufacturer to send a message to the topic, there is a key parameter. But when a consumer reads from a topic, there is a section number.

  • How are the sections numbered? Starting at 0 or 1?
  • What exactly is the relationship between the key and the partition? As I understand it, some function on the key will determine the section. it is right?
  • If I have 2 sections in a topic and you want some specific messages to go to one section and other messages to another, should I use a specific key for one specific section, and the rest for another?
  • What if I have 3 sections and one type of message on one specific section, and the rest on the other 2?
  • As usual, I send messages to a specific section to find out which readers are reading? Or am I better off with multiple themes?

Thanks in advance.

+19
apache-kafka kafka-consumer-api kafka-producer-api
source share
2 answers

Igor

Sections increase the parallelism of Kafka's theme. Any number of consumers / manufacturers can use the same section. It is up to the application layer to define the protocol. Kafka guarantees delivery. As for the API, you can look at Java docs as they may be more complete. Based on my experience:

  1. Sections start at 0
  2. Keys can be used to send messages to the same section. For example, the hash (key)% num_partition. Logic connects to the manufacturer. https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/producer/Partitioner.html
  3. Yes. but be careful not to get any key that will lead to a "dedicated" section. You can have a separate topic for this. For example, manage the theme and data theme
  4. This seems to be the same question as 3.
  5. I believe that consumers should not make assumptions about section-based data. A typical approach is to have a consumer group that can read several sections of a topic. If you want to have dedicated channels, it is better (safer / maintain) to use separate themes.
+14
source share

Does this mean if I want to have more than 1 consumer (from the same group) reading from one topic I need to have more than 1 section?

Let's look at the following properties of kafka:

  • each section is consumed by exactly one consumer in the group
  • one consumer in a group can use more than one section
  • the number of consumer processes in the group should be & lt; = number of partitions

Thanks to these properties, kafka can provide both ordering guarantees and load balancing through a pool of user processes.

To answer your question, yes, in the context of the same group, if you want to have N consumers , you must have at least N partitions .

Does this mean that I need the same number of sections as the number of consumers? for the same group?

I think this was explained in the first answer.

How many consumers can read from one section?

number of consumers , which can read from one section, is always equal to the number of consumer groups subscribing to this topic.

The relationship between keys and sections in relation to the API

First, we must understand that producer is responsible for choosing the record that should be assigned to a particular section in the topic.

Now let's see how the producer does this. First, let's look at the definition of the ProducerRecord.java class:

 public class ProducerRecord<K, V> { private final String topic; private final Integer partition; private final Headers headers; private final K key; private final V value; private final Long timestamp; } 

Here the field we need to understand from the class is partition .

From the documents

  • If a valid partition number is specified, this partition will be used when sending the record.
  • If a section is not specified but key present, the section will be selected using hash of the key .
  • If neither key nor partition present, the partition will be assigned in round-robin fashion .
+21
source share

All Articles