Kafka

Intro

Apache Kafka is a distributed event streaming platform built around an append-only commit log: producers append records to topic partitions, and consumers read at their own pace using offsets.
It matters because it combines durability, high throughput, and replayability, which makes it the backbone of many event-driven architectures at scale.
You reach for Kafka when you need independent producers and consumers, long-lived event history, and horizontal scaling without losing per-key ordering.
Common use cases include event sourcing, stream processing, log aggregation, change data capture, and real-time analytics.

Core Architecture

flowchart LR
    P1[Producer A] --> PR[Partitioner chooses partition]
    P2[Producer B] --> PR

    subgraph Topic[Topic orders]
      T1[Partition 0]
      T2[Partition 1]
      T3[Partition 2]
    end

    PR --> T1
    PR --> T2
    PR --> T3

    subgraph BrokerCluster[Kafka brokers]
      B1[Broker 1 leader P0]
      B2[Broker 2 leader P1]
      B3[Broker 3 leader P2]
    end

    T1 --> B1
    T2 --> B2
    T3 --> B3

    subgraph CG[Consumer Group orders service]
      C1[Consumer 1]
      C2[Consumer 2]
      C3[Consumer 3]
    end

    B1 --> C1
    B2 --> C2
    B3 --> C3

Topics

Partitions

Producers

Consumer Groups

Offsets

Brokers, Leaders, and Followers

ZooKeeper to KRaft

Delivery Semantics

Kafka delivery guarantees come from producer acknowledgement settings, replication health, and consumer offset management.

At-most-once

At-least-once

Exactly-once

acks setting

Partition Key Design

Partition key strategy determines both ordering behavior and workload balance.

Design strategies:

C# Example with Confluent.Kafka

using System.Text.Json;
using System.Threading;
using Confluent.Kafka;

record Order(string OrderId, string CustomerId, decimal Amount);

var producerConfig = new ProducerConfig
{
    BootstrapServers = "localhost:9092",
    Acks = Acks.All,
    EnableIdempotence = true
};

var consumerConfig = new ConsumerConfig
{
    BootstrapServers = "localhost:9092",
    GroupId = "orders-worker",
    AutoOffsetReset = AutoOffsetReset.Earliest,
    EnableAutoCommit = false,
    EnablePartitionEof = false
};

var order = new Order("order-1001", "customer-42", 149.99m);
using var cts = new CancellationTokenSource();
var ct = cts.Token;

// Producer
using var producer = new ProducerBuilder<string, string>(producerConfig).Build();
await producer.ProduceAsync("orders", new Message<string, string>
{
    Key = order.CustomerId,
    Value = JsonSerializer.Serialize(order)
});

// Consumer
using var consumer = new ConsumerBuilder<string, string>(consumerConfig).Build();
consumer.Subscribe("orders");
try
{
    while (!ct.IsCancellationRequested)
    {
        var result = consumer.Consume(ct);
        var parsedOrder = JsonSerializer.Deserialize<Order>(result.Message.Value);
        if (parsedOrder is null)
        {
            continue;
        }
        // process...
        consumer.Commit(result);
    }
}
catch (OperationCanceledException)
{
    // graceful shutdown
}
finally
{
    consumer.Close();
}

Kafka vs RabbitMQ Tradeoffs

Dimension Kafka RabbitMQ
Model Distributed append-only log Queue and exchange broker model
Ordering Strong ordering per partition Queue order exists but can vary with competing consumers and requeue
Replay Native replay through offsets and retention Replay is not a first-class primitive
Routing flexibility Simpler partition and topic model Rich routing patterns (direct, topic, fanout, headers)
Throughput Extremely high for sequential event streams Strong for messaging workloads but usually lower at large stream scale
Latency Excellent throughput-oriented latency Often very low for command dispatch and request-reply
Operational complexity Higher partition and cluster tuning complexity Simpler to start, complexity grows with topology and reliability features

Pitfalls

Hot partitions from bad key design

Consumer lag grows unnoticed

kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group orders-worker --describe

Too many partitions

Ignoring acks=all for critical data

Interview Questions

References


Whats next