Commit log & Retention
Learn the concepts about the commit log and retention policy.
One of the key qualities of Kafka is retaining the records for a certain period of time. But how does it work?
Let us take the example of Kafka broker, Topic producer, and consumer.
When the producer sends the message it first reaches the Topic and the very next thing that happens is that the record gets returned to the file system in the machine.
So the file system is where the Kafka broker is installed. In this example, it is a local machine.
The record is always written into the file system as
bytes. The file system where the file needs to be written
log.dirs property which is available in the
- It creates a file with an extension
.logEach partition will have its own log i.e say if we have 4
partitionsthen we will have 4
logfiles. That is why it is called a Partition commit log.
- After the message is written into the log file. That’s when the records that got produced are committed. So when the consumer who is continuously pulling for the new records can only see the records that are committed to the file system.
- As new records are published to the Topics then the records get appended to the log file and the process continues. So this is all about
The explanation for the commit log
- Configuration of
2. checking log files
test-topicwhen we created has four folders starting with 0,1,2 and 3, each folder represent partitions i.e
test-topics-3represents a partition.
- You can see
_consumer_offset-49. This is the topic that got created for us automatically for maintaining the consumer offsets and it has the value until 49. That means this consumer offset topic has 50 partitions.
let’s navigate into
test-topics-0 the directory. We are going to focus on the ".log" file.
- All data written into this file will be converted to the file and then written into these files.
- Similarly, you can navigate to the
test-topics-3directories. So this is content that is available inside the Kafka partition log but there is much more information that is available as part of the log.
- In order to view these, we have a command.
.\bin\windows\kafka-run-class.bat kafka.tools.DumpLogSegments --deep-iteration --files /c:/kafka/kafka-logs/test-topic-0/00000000000000000000.log
using the above command we are going to perform deep iteration and then we are going to provide a path where the partition log is present.
here we can see offset value is started from
0 and continuously increasing. This means this particular partition has
3 records but it don’t display the actual value that is present inside the partition log but it gives lots of different attributes like
1. CreatedTime — time of the creation e.g
2. keySize — where these records are not produced with any key e.g
valueSize — what is the value? e.g
- Retention policy is the key policy that is going to determine how long the message is going to be retained.
- It is configured using
log.retension.hoursin server.properties file.
- The default retention period is 168 hours (7 days). This is the retention period that comes with Kafka by default. The actual matrix is
- If the log retention period is exceeded then it is going to delete the data from the log. i.e
- When this size is reached a new log segment will be created. i.e
How does Kafka behave like a Distributed Streaming System?
As per the documentation of Apache Kafka is a distributed streaming platform, but what does it actually mean?