1. Apache Kafka Fundamentals

What is Kafka?

Kafka is a distributed message queue, this means that you can scale it horizontally as well. On one side, there is a producer, which produces messages. On the other side, there is a consumer which processes these messages.

Why buffer messages? If there is no buffer in between, during traffic spikes, the speed which data is being ingested might exceed the speed which data is being stored in databases. This would jam the data queue and data can be lost. With a buffer, the processing framework (either batch or stream processing) can take out as much data as it needs before data is stored.

Frame 1.png

Messages in the message queues are only stored temporarily. You can set the time to live for the messages before it gets deleted. The best practice depends on your needs, depending on weekends, or any emergency that might come up. How long and how much can you buffer?

Basic Kafka Parts

Kafka is just a framework. Kafka has topics, messages, brokers, producers and consumers/consumer groups.

Topics

Topics has partitions. Each partition also have replicas on multiple servers, in case of any accidents. If one server goes down, there is still another working server. This ensures data availability.

Messages

Messages have keys and messages, similar to the concept of a key-value pair. You can also set a max size of the message. In message queues, you will be mostly working with small messages that go through very quickly.