Distributed Streaming System

How does Kafka behave like a Distributed Streaming System?

As per the documentation of Apache Kafka is a distributed streaming platform, but what does it actually mean?

A distributed system is a collection of systems working together to deliver functionality.

In this example, we have 3 systems that will interact with each other.

  • we have clients who interact with these systems and the systems receive the request from the client and respond to them.

Characteristics of Distributed System

1. Availability and Fault-Tolerant

→ let's say one of the systems is down, still won’t impact the overall availability of the system. Even in this case, client requests will be handle gracefully.

2. Reliable work distribution

→ The request that received from the clients, in general, are equally distributed between the available systems.

3. Easily Scalable

→ It is easily scalable means adding new systems to the existing setup is really easy.

4. Handling Concurrency is fairly easy

→ Concurrency is another quality that can be handled fairly with distributed systems.

How Apache Kafka works?

we have a zookeeper and a Kafka broker running in our machine till now.

→ In any enterprise, it is pretty common to have a bunch of producers and consumers.

→ With one broker in place, let’s walk through some of the key behaviors all the producers and consumer requests will go to the same broker.

There is a big possibility the current setup will get overwhelmed with a bunch of requests faster and it might crash the system or it might make the broker perform poorly.

The next big thing is that if the broker gets down for some reason then there is no way to serve the client's request. This leads to a single point of failure.

Kafka Cluster

To solve the above problem of broker failure we have a Kafka Cluster.

Normally a cluster can have one broker or more. It is pretty common to have more than more broker.

In this example, we have 3 brokers and the cluster will be managed by the zookeeper.

→ All the brokers send the heartbeat to the zookeeper at regular intervals to ensure that the state of the Kafka broker is healthy and active to serve the client request

→ With three brokers in place, the client request is evenly distributed between them and it handles the load pretty well.

  • If one of the brokers goes down then the cluster manager which is a zookeeper here gets notified then all the client's requests will be routed to another available broker. The client doesn’t have a single clue that the issue is going on.
  • It is easy to scale the number of brokers in the cluster without affecting the clients.
  • In the event of failure or data loss, Kafka handles it using a Replication.

Kafka Tutorials

Next →

Previous ←

Software Engineer at HCL | Technical Content Writing | Follow me on LinkedIn https://www.linkedin.com/in/sagarkudu/