Internal Working of Kafka Template.

How does Kafka Template work internally?

Do you have an understanding of how Kafka Template works internally?

The KafkaTemplate.send() is going to send the message to Kafka. But in reality, it goes through different layers before the message is sent to Kafka.

— The very first layer is the serializer. Any records sent to Kafka need to be serialized to bytes.

There are two serialization techniques that need to be applied to new records.

  1. key.serializer
  2. value.serializer

— This configuration is mandatory for any producer. The clients need to provide the key.serializer value and value.serializer value.

The Kafka client java libraries come with some predefined serializers.

— The second layer is partitioner. This layer determines which partition the message is going to into the topic.

The Kafka producer API comes with default partitioner logic and in most cases, that’s enough to handle partitioning logic and there are options to override this default partitioner logic too.

— The third layer is RecordAccumulator. Any record sent from the Kafka template won’t get sent to the topic immediately. The RecordAccumulator buffers the records and the records are sent to the Kafka topics once the buffer is full.

The reason for this approach is to limit the number of trips from the application to the Kafka cluster and this eventually avoids the overhead of bombarding the Kafka cluster with numerous requests which also helps in improving the overall performance of the system.

The RecordBatch is a representation of the topic partition combination. When we have a topic with 3 partitions then we will have 3 record batch.

Each and every RecordBatch has a batch size which is represented by batch.size property and the value is represented as a number of bytes.

It also has overall buffer memory which is represented by the property buffer.memory and this value is represented as a number of bytes.

→ Once the batch is full then the message will be sent to the Kafka topic. There may be many scenarios where the RecordBatch won’t fill up, the producer API is not going to wait for so long to send the message to the Kafka topic.

→ There is also another property called linger.ms which will be used in this case to publish the records into Kafka topic, as the name suggests this value is represented in a millisecond.

If the batch is not full then the batch and the records accumulated meets the linger.ms value then the records will be sent to Kafka.

Software Engineer at HCL | Technical Content Writing | Follow me on LinkedIn https://www.linkedin.com/in/sagarkudu/