Kafka
Why Kafka Exists
You have 50 microservices. Orders need to reach inventory, billing, shipping, and analytics. With point-to-point HTTP, each service needs its own connection to every other service — 200 connections. When one service is slow, failures cascade. You cannot replay messages after a crash. The more services you add, the more tangled the graph becomes.
Kafka decouples everything through a distributed commit log. Producers write messages to a topic. Consumers read from the topic independently, at their own pace. A consumer crash doesn’t block the producer. Adding a new consumer doesn’t require changing any existing service. The message is the contract, not the service.
Built for high throughput at scale: LinkedIn processes 7 trillion messages per day through Kafka.
Cluster & Broker
A Kafka cluster is a group of brokers — physical servers or containers — providing high availability and scalability. A cluster has 1 or more brokers.
For every partition, one broker is elected as the Partition Leader. It handles all read and write requests for that partition. Other brokers act as Partition Followers, replicating the leader’s data. If the leader fails, a follower automatically steps up to become the new leader.
Topic & Partition
A topic is a logical category where you send and categorize data (e.g., orders, payments, user-events). Each topic must have at least 1 partition (configurable). Partitions are distributed across different brokers to allow multiple producers and consumers to work simultaneously.
The partition is the unit of parallelism in Kafka. More partitions = more throughput, but also more overhead.
Log Segments
On disk, a partition is a directory. Each segment is a pair of files:
.log— raw message data, appended sequentially.index— sparse offset-to-byte-offset mapping (one entry per ~4096 bytes).timeindex— timestamp-to-offset mapping
Segments are:
- Rolled when they hit
segment.bytes(default 1 GB) orsegment.ms(default 7 days) - Active segment — the one currently being written to
- Older segments — immutable, read-only, candidates for compaction/deletion
Sequential I/O on the
.logfile makes Kafka fast — writes are pure appends.
Replication
ISR (In-Sync Replicas)
A partition leader maintains a set of replicas that are “caught up” — the ISR. A follower is in the ISR if it has fully replicated all messages up to the last committed offset. If a follower falls behind (replication lag > replica.lag.time.max.ms, default 30s), it is removed from ISR.
min.insync.replicas— minimum number of in-sync replicas required for the leader to accept writes- If ISR size drops below this threshold, the broker rejects writes with
NotEnoughReplicasException - Tradeoff: higher value = stronger durability, lower availability (fewer brokers able to write)
High Watermark (HW)
The offset up to which all ISR replicas have committed. Consumers can only read up to HW — not the Log End Offset (LEO).
- When a leader receives a write, it advances LEO immediately but HW only advances once all in-sync followers have replicated the message.
- During leader failover, the new leader truncates to the HW of the old leader to guarantee consistency across replicas.
- Messages between HW and LEO are not visible to consumers — they exist on the leader but aren’t fully replicated yet.
Controller Broker
Every cluster has one Controller broker. It manages partition leadership across the cluster:
- Monitors broker heartbeats (via ZK ephemeral nodes or KRaft quorum)
- Assigns partition leaders when a broker fails or a new partition is created
- Manages partition reassignments (e.g., adding replicas, migrating to new brokers)
If the Controller fails, a new one is elected automatically via ZooKeeper or the KRaft quorum.
ZooKeeper / KRaft
Kafka uses a metadata store to track cluster state:
- ZooKeeper (legacy): Stores broker membership, topic configs, ACLs, quotas, and Controller election. Kafka ≤ 2.8 required ZK. Deprecated as of KIP-833, targeted for removal in Kafka 4.0.
- KRaft (Kafka Raft): Self-managed metadata quorum introduced in 2.8 (GA in 3.3+). No external dependency — Kafka runs as a single binary with internal Raft-based consensus for metadata. Replaces ZK entirely. The Raft consensus mechanism is covered in detail in etcd-raft.md.
Producer
Each message contains a Key (optional, partition routing), Value (payload), Headers (optional metadata), Timestamp, and an Offset (assigned by the broker on write).
Partition Routing
When a topic has multiple partitions, the producer decides where each message lands:
- Key-based hash — same key always goes to the same partition. Enables ordering per key.
- Round-robin — spreads messages evenly across partitions.
- Custom partitioner — application-defined placement logic.
Acks Levels
Controls how many replica acknowledgments the leader requires before responding:
import "github.com/twmb/franz-go/pkg/kgo"
// acks=0 — Fire-and-forget. Highest throughput, lowest durability.
// Messages can be silently lost on leader crash.
client, _ := kgo.NewClient(
kgo.SeedBrokers(brokers...),
kgo.RequiredAcks(kgo.NoLeaderAck()), // acks=0
)
// acks=1 — Leader writes to its log and responds. Default.
// Good durability, but a leader crash before replication can lose data.
client, _ = kgo.NewClient(
kgo.SeedBrokers(brokers...),
kgo.RequiredAcks(kgo.LeaderAck()), // acks=1
)
// acks=all — Leader waits for all in-sync replicas to acknowledge.
// Strongest durability, highest latency.
client, _ = kgo.NewClient(
kgo.SeedBrokers(brokers...),
kgo.RequiredAcks(kgo.AllISRAcks()), // acks=-1 / all
)
| Acks | Durability | Throughput (1 KB, 3 brokers) | Risk |
|---|---|---|---|
| 0 | None | ~1.5M msg/s | Silent data loss on leader crash |
| 1 | Single node | ~1M msg/s | Data loss if leader crashes before followers replicate |
| all | Strong | ~300K msg/s | Highest safety, 5× slower than acks=0 |
Idempotent Producer
Guarantees exactly-once semantics for writes, ensuring retries won’t create duplicates. Each producer gets a unique producer ID, and the broker deduplicates by (producer ID, sequence number).
// Idempotent: retries won't create duplicates
client, _ = kgo.NewClient(
kgo.SeedBrokers(brokers...),
kgo.IdempotentProducer(),
)
Message Ordering
Within a partition, Kafka orders messages strictly by offset (FIFO). Across partitions, no ordering guarantee.
Ordering requires a single partition, which limits parallelism. If you need per-key ordering (e.g., user events), use key-based routing to land all messages for the same key in the same partition.
Consumer
Consumer Group
A consumer group identifies a set of consumers that share the partitions of a topic. If 3 consumers run on different servers with the same group.id, Kafka distributes the topic’s partitions among them.
Without consumer group (each instance gets ALL messages):
import "github.com/twmb/franz-go/pkg/kgo"
// Service Instance 1
client1, _ := kgo.NewClient(
kgo.SeedBrokers(brokers...),
kgo.ConsumeTopics("orders"),
// NO consumer group
)
// Service Instance 2
client2, _ := kgo.NewClient(
kgo.SeedBrokers(brokers...),
kgo.ConsumeTopics("orders"),
// NO consumer group
)
// Result: BOTH instances receive ALL messages from "orders" topic
// Message 1 → Instance 1 ✅
// Message 1 → Instance 2 ✅ (duplicate!)
// Message 2 → Instance 1 ✅
// Message 2 → Instance 2 ✅ (duplicate!)
With consumer group (partitions are distributed):
// Service Instance 1
client1, _ := kgo.NewClient(
kgo.SeedBrokers(brokers...),
kgo.ConsumeTopics("orders"),
kgo.ConsumerGroup("order-processor-group"), // ← Same group
)
// Service Instance 2
client2, _ := kgo.NewClient(
kgo.SeedBrokers(brokers...),
kgo.ConsumeTopics("orders"),
kgo.ConsumerGroup("order-processor-group"), // ← Same group
)
// Result: Kafka splits partitions between instances
// Instance 1 gets: Partition 0, Partition 1
// Instance 2 gets: Partition 2, Partition 3
// Each message goes to ONLY ONE instance ✅
Consumer Offsets
Consumer offsets act as a bookmark. For example, if a topic has 10 messages and the consumer has committed up to offset 5:
Messages in Kafka topic: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Consumer processes: 0, 1, 2, 3, 4
Consumer commits: offset 5 (meaning "I'm done up to 5")
Consumer crashes 💥
On restart:
- WITH commits: Starts from offset 5 → [5, 6, 7, 8, 9]
- WITHOUT commits: Starts from beginning → [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] (duplicates!)
Consumer Lag
The difference between the latest partition offset and the consumer’s committed offset — a measure of how “behind” a consumer is.
- If you have 1 partition and 2 consumers in the same group, Kafka gives the partition to Consumer A and leaves Consumer B idle. A single partition is only ever assigned to one consumer at a time.
Group Coordinator
Determined by:
A specific broker responsible for a consumer group:
- Tracks heartbeats from consumers to make sure they are still alive.
- Triggers a rebalance if a consumer joins or leaves.
- Stores the committed offsets for that group.
Rebalancing
When a consumer joins or leaves a group, Kafka redistributes partitions across the remaining consumers. This is called a rebalance.
- Eager rebalancing (old protocol): All consumers stop consuming, revoke all partitions, then rejoin and get reassigned. A “stop-the-world” event — no progress during rebalance.
- Cooperative sticky rebalancing (modern, Kafka ≥ 2.4): Consumers only revoke a subset of partitions, letting the rest continue processing. Fewer pauses, smoother transitions.
Consumer Commit Strategies
enable.auto.commit=true(default): Offsets auto-committed everyauto.commit.interval.ms(default 5s). If the consumer crashes between processing and auto-commit, messages are reprocessed — at-least-once semantics.- Manual commit: Disable auto-commit, call
commitSync()orcommitAsync()explicitly.commitSync()— blocking, retries on failure. Call after processing each batch.commitAsync()— non-blocking, callback on failure. Higher throughput but no retry.
Process → commit = at-least-once. Commit → process = at-most-once (messages lost on crash).
Performance
Throughput
Add more partitions and use larger batches. Measured in MB/s or msg/s.
On a 3-broker cluster with 1 KB messages:
acks=0: ~1.5M msg/sacks=1: ~1M msg/sacks=all: ~300K msg/s
LinkedIn processes 7 trillion messages per day through Kafka. At this scale, partition count is a first-order concern — they run tens of thousands of partitions across thousands of brokers.
Latency
Latency = (Producer Batching Time) + (Network Trip) + (Broker Disk Write) + (Consumer Processing Time).
The trade-off: reducing latency (making it “real-time”) often requires lowering batch sizes, which reduces overall maximum throughput. At low throughput (<10K msg/s), end-to-end latency can be <10ms. At peak throughput, latency rises to 50–200ms depending on batch configuration.
Bottlenecks
- Producer Bottleneck: The network or CPU can’t keep up with the data your app is generating.
- Partition Bottleneck: A single partition is overwhelmed by too many messages (often caused by a “Hot Key”). The fix is either more partitions or key design changes.
- Consumer Bottleneck: The most common clog. Business logic (database writes, API calls) is slower than the incoming message rate. Requires scaling consumers or optimizing processing.
If your system hits a bottleneck, do not try to make a single thread faster. Increase throughput by adding more partitions and consumers to process data in parallel.
The Cost of More Partitions
Adding partitions increases throughput but carries tradeoffs:
- More open file handles on each broker (each partition = a directory + 3 files)
- Longer controller failover time (the controller must reconcile more partition leaders)
- Higher end-to-end latency (producer connections distribute across more leaders)
- More memory for replica fetchers
Keep partition count to ~100 per broker unless you have a specific throughput requirement that justifies more.
Retention & Compression
Retention Policies
How Kafka decides which data to delete:
- Delete (default): Remove old segments based on time (
retention.ms, default 7 days) or total size (retention.bytes, default infinite). Oldest segments are deleted first. - Compact: Keep only the latest message for each key. Useful for keyed data (e.g., user profile changes) where you only care about the current state. Log compaction runs in the background on inactive segments.
Compression
Producers can compress message batches before sending:
- Supported codecs:
gzip,snappy,lz4,zstd - Typical ratios on JSON payloads: gzip 5–10× reduction, snappy 2–4×, zstd 3–6×
- Benefits: smaller network transfer, less disk usage
- Cost: producer CPU for compression, consumer CPU for decompression
- Configurable on the producer side; Kafka stores and serves compressed batches as-is — decompression only happens on the consumer
| Codec | Compression Ratio (JSON) | CPU Cost | Latency Impact |
|---|---|---|---|
| snappy | 2–4× | Low | Minimal |
| lz4 | 2–4× | Low | Minimal |
| zstd | 3–6× | Medium | Slight |
| gzip | 5–10× | High | Noticeable at high volume |
For most use cases, snappy offers the best balance of compression and speed. Use zstd when network bandwidth is the constraint. Use gzip only when storage costs dominate.
Kafka vs. RabbitMQ vs. NATS
| Dimension | Kafka | RabbitMQ | NATS |
|---|---|---|---|
| Model | Distributed commit log | Message broker (AMQP) | Lightweight pub/sub |
| Throughput | 1M+ msg/s | 100K msg/s | 1M+ msg/s |
| Latency | 10–100ms | <1ms | <1ms |
| Message replay | Yes — by offset | No (after ack, message is gone) | No |
| Ordering | Per-partition, strict | Per-queue, strict | Per-subject, best-effort |
| Routing | Topic-based partition routing | Complex exchanges, bindings | Subject-based with wildcards |
| Persistence | Durable, configurable retention | Durable, auto-delete on ack | Optional (JetStream) |
| Operational complexity | High (brokers, ZK/KRaft, rebalancing) | Medium | Low |
Reach for Kafka when: you need high throughput, message replay, strong ordering within a partition, or retention for batch processing.
Reach for RabbitMQ when: you need complex routing (topic exchanges, headers exchanges, direct queues), per-message acknowledgments, or low latency with moderate throughput.
Reach for NATS when: you need the simplest possible pub/sub, ultra-low latency (<1ms), or an embedded messaging layer within a Go service.
Reference
- Apache Kafka Documentation — official configs, protocol, and design
- KRaft (KIP-833) — self-managed metadata quorum
- Confluent Documentation — practical guides and best practices
- “Kafka: The Definitive Guide” (O’Reilly) — Neha Narkhede, Gwen Shapira, Todd Palino
- franz-go — Go client used in this document