Commit log & Retention

Explain Commit log & Retention Policy in Kafka.

Learn the concepts about the commit log and retention policy.

Commit Log

One of the key qualities of Kafka is retaining the records for a certain period of time. But how does it work?

Let us take the example of Kafka broker, Topic producer, and consumer.

Continuous pull for new message and finally added to file system and log file.

When the producer sends the message it first reaches the Topic and the very next thing that happens is that the record gets returned to the file system in the machine.

So the file system is where the Kafka broker is installed. In this example, it is a local machine.

The record is always written into the file system as bytes. The file system where the file needs to be written log.dirs property which is available in the file.

e.g log.dirs=c:/kafka/kafka-logs

  • It creates a file with an extension .log Each partition will have its own log i.e say if we have 4 partitions then we will have 4 log files. That is why it is called a Partition commit log.
  • After the message is written into the log file. That’s when the records that got produced are committed. So when the consumer who is continuously pulling for the new records can only see the records that are committed to the file system.
  • As new records are published to the Topics then the records get appended to the log file and the process continues. So this is all about commit log.

The explanation for the commit log

  1. Configuration of file file

2. checking log files

  • The test-topic when we created has four folders starting with 0,1,2 and 3, each folder represent partitions i.e test-topics-0, test-topics-1, test-topics-2 and test-topics-3 represents a partition.
  • You can see _consumer_offset-0 to _consumer_offset-49 . This is the topic that got created for us automatically for maintaining the consumer offsets and it has the value until 49. That means this consumer offset topic has 50 partitions.

let’s navigate into test-topics-0 the directory. We are going to focus on the ".log" file.

  • All data written into this file will be converted to the file and then written into these files.
  • Similarly, you can navigate to the test-topics-1, test-topics-2 and test-topics-3 directories. So this is content that is available inside the Kafka partition log but there is much more information that is available as part of the log.
  • In order to view these, we have a command.
.\bin\windows\kafka-run-class.bat --deep-iteration --files /c:/kafka/kafka-logs/test-topic-0/00000000000000000000.log

using the above command we are going to perform deep iteration and then we are going to provide a path where the partition log is present. /c:/kafka/kafka-logs/test-topic-0/00000000000000000000.log

here we can see offset value is started from 0 and continuously increasing. This means this particular partition has3 records but it don’t display the actual value that is present inside the partition log but it gives lots of different attributes like

1. CreatedTime — time of the creation e.g 1618124627557

2. keySize — where these records are not produced with any key e.g -1

3. valueSize — what is the value? e.g 1

Retention Policy

  • Retention policy is the key policy that is going to determine how long the message is going to be retained.
  • It is configured using log.retension.hours in file.
  • The default retention period is 168 hours (7 days). This is the retention period that comes with Kafka by default. The actual matrix is hours. file
  • If the log retention period is exceeded then it is going to delete the data from the log. i.e
  • When this size is reached a new log segment will be created. i.e log.segment.bytes=1073741824

Kafka Tutorials

Software Engineer at HCL | Technical Content Writing | Follow me on LinkedIn

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store