Sharding

Intro

Sharding is horizontal partitioning: you split a single database's rows across multiple independent database instances (shards), each owning a non-overlapping subset of the data. Unlike table partitioning (which splits data within one server), sharding distributes data across separate machines, so each shard can be scaled, backed up, and failed over independently. The core problem it solves: when write throughput or total data volume exceeds what a single node can handle, writes still funnel through one primary regardless of how many read replicas or caches you add. Sharding is a last resort. Before reaching for it, exhaust vertical scaling, read replicas, caching, and table partitioning first, each of which is operationally simpler and preserves cross-table query semantics.

Sharding Strategies

The shard key determines which shard owns a given row. Choosing it wrong is the most expensive mistake you can make.

Strategy Routing Hotspot Risk Resharding Cost Best For
Range-based key / range_size High (sequential inserts) Low (add new range) Time-series, archival
Hash-based hash(key) % N Low (uniform distribution) Very high (all data moves) User data, uniform access
Directory-based Lookup table maps key to shard None (explicit control) Low (update table) Multi-tenant, irregular data
Geographic Region/country to shard Medium (population skew) Medium GDPR compliance, latency

Consistent hashing

Simple modulo hashing (hash(key) % N) has a fatal flaw: adding one shard changes N, which remaps almost every key to a different shard. A full data migration follows. Consistent hashing places both keys and shards on a hash ring. Adding a shard only displaces the keys that fall between the new shard and its predecessor, limiting remapping to roughly 1/N of keys. Virtual nodes (multiple ring positions per physical shard) smooth out uneven distribution further. A practical variant is pre-allocating many logical shards mapped to fewer physical machines: adding capacity means remapping logical shards to new nodes, moving only the affected partitions rather than the entire dataset.

Cross-Shard Operations

Cross-shard joins can't be executed as a single native join across independent shard databases without middleware or application fan-out. The application must fetch rows from each relevant shard and join them in memory, which is expensive and easy to get wrong under load.

Across independent shards, transactions typically require two-phase commit (2PC) or a redesign toward single-shard transactions, sagas, or eventual consistency. Most sharding setups don't provide native cross-shard ACID, so the common answer is to design the schema so that all writes for a business entity land on the same shard.

Scatter queries (queries without the shard key in the WHERE clause) fan out to every shard in parallel, aggregate results, and return. At small scale this is invisible. At large scale it's a silent performance killer that grows linearly with shard count.

Replicate static reference data (country codes, product categories, config tables) to every shard. The duplication cost is low; the alternative is a cross-shard lookup on every query that touches that data.

Tradeoffs

Dimension Sharding Alternatives
Write scaling Distributed across shards Read replicas don't help writes
Operational complexity High: key choice, rebalancing, monitoring Vertical scaling is operationally simple
Cross-shard queries Expensive or impossible Single DB handles any query
Rollback difficulty Very hard: data is spread Simple: one database
When to use Write throughput or storage exceeds single node Try vertical scale, read replicas, partitioning, caching first

Decision flowchart

flowchart TD
    A[Database bottleneck] --> B{Write throughput saturated}
    B -->|No| C[Read replicas plus caching]
    B -->|Yes| D{Data model shard-friendly}
    D -->|No| E[Consider NewSQL like CockroachDB or Spanner]
    D -->|Yes| F[Design shard key then shard]

Pitfalls

Hotspot shards. Uneven key distribution concentrates load on one shard while others sit idle. Sequential keys (auto-increment IDs, timestamps) are the classic cause with range-based sharding. Mitigation: use hash-based keys and monitor per-shard CPU and query latency separately.

Wrong shard key. A key absent from most query WHERE clauses forces scatter queries on every read. Changing the key later requires a full data migration. Mitigation: analyze your top queries before choosing; the key must appear in the majority of them.

Resharding pain. Simple modulo hashing requires moving nearly all data when you add a shard. Mitigation: use consistent hashing, or pre-allocate many logical shards mapped to fewer physical machines so adding capacity only remaps the affected partitions.

Cross-shard transactions. A write spanning two shards can partially commit if one fails, leaving data inconsistent. Mitigation: design the schema so all writes for a business entity land on the same shard; use sagas with compensating transactions when cross-shard writes are unavoidable.

Operational complexity multiplied. Migrations, backups, and incident response all scale with shard count. Mitigation: invest in automation before sharding. Managed services (Vitess, PlanetScale, CockroachDB) absorb much of this cost.

Questions


Whats next