I have a kafka environment in which there are 2 brokers and 1 zoo.
While I try to create messages for kafka, if I focus on broker 1 (which is the leader), the client stops messaging and gives me an error below, although broker 2 is chosen as the new leader for the topic and partions.
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60,000 ms.
After 10 minutes, since broker 2 is the new leader, I expected the manufacturer to send data to broker 2, but it continued to fail, giving an exception. lastRefreshMs and lastSuccessfullRefreshMs are still the same, although metadataExpireMs is 300,000 for the manufacturer.
I am using the kafka new Producer implementation on the manufacturer side.
It seems that when the producer initiates, he contacts one broker, and if this broker goes down, he does not even try to connect to other brokers in the cluster.
But my expectation is that the broker is going down, he should directly check the metadata for other brokers that are available and send them data.
Btw my topic is 4 sections and has a replication rate of 2. Providing this information if it makes sense.
Configuration options.
{request.timeout.ms=30000, retry.backoff.ms=100, buffer.memory=33554432, ssl.truststore.password=null, batch.size=16384, ssl.keymanager.algorithm=SunX509, receive.buffer.bytes=32768, ssl.cipher.suites=null, ssl.key.password=null, sasl.kerberos.ticket.renew.jitter=0.05, ssl.provider=null, sasl.kerberos.service.name=null, max.in.flight.requests.per.connection=5, sasl.kerberos.ticket.renew.window.factor=0.8, bootstrap.servers=[10.201.83.166:9500, 10.201.83.167:9500], client.id=rest-interface, max.request.size=1048576, acks=1, linger.ms=0, sasl.kerberos.kinit.cmd=/usr/bin/kinit, ssl.enabled.protocols=[TLSv1.2, TLSv1.1, TLSv1], metadata.fetch.timeout.ms=60000, ssl.endpoint.identification.algorithm=null, ssl.keystore.location=null, value.serializer=class org.apache.kafka.common.serialization.ByteArraySerializer, ssl.truststore.location=null, ssl.keystore.password=null, key.serializer=class org.apache.kafka.common.serialization.ByteArraySerializer, block.on.buffer.full=false, metrics.sample.window.ms=30000, metadata.max.age.ms=300000, security.protocol=PLAINTEXT, ssl.protocol=TLS, sasl.kerberos.min.time.before.relogin=60000, timeout.ms=30000, connections.max.idle.ms=540000, ssl.trustmanager.algorithm=PKIX, metric.reporters=[], compression.type=none, ssl.truststore.type=JKS, max.block.ms=60000, retries=0, send.buffer.bytes=131072, partitioner.class=class org.apache.kafka.clients.producer.internals.DefaultPartitioner, reconnect.backoff.ms=50, metrics.num.samples=2, ssl.keystore.type=JKS}
Use Case:
1- Start BR1 and BR2 Generate data (Leader - BR1)
2- Stop BR2 produces data (fine)
3- Stop BR1 (which means that there is currently no active working broker in the cluster), and then start BR2 and produce data (failed, although the leader is BR2)
4- BR1 ( BR2, )
5- BR2 ( BR1 )
6- BR1 (BR1 )
7- BR1 ( )
BR1, , , BR1 , BR2 . ?