Disjoint Set
Intro
Disjoint Set Union (Union-Find) is a classic structure for tracking connectivity as you merge groups over time. It shows up in graph problems, clustering, and any scenario where you repeatedly union sets and query whether two items are connected. Example: Kruskal's algorithm uses DSU to build a minimum spanning tree while avoiding cycles.
Diagram
flowchart TD
A[Need connectivity queries] --> B[Initialize parent and rank arrays]
B --> C{Operation type}
C -->|Find| D[Follow parent links to root]
D --> E[Compress path to root]
E --> C
C -->|Union| F[Find roots of both nodes]
F --> G{Roots equal}
G -->|Yes| C
G -->|No| H[Attach smaller rank tree under larger rank tree]
H --> CQuestions
Why combine path compression with union by rank?
- Path compression flattens trees during
findcalls. - Union by rank prevents tall trees from forming during merges.
- Together they give almost constant amortized time in practice.
- Why it matters: this is the reason DSU scales to large connectivity workloads.
What real problems are naturally modeled by DSU?
- Connectivity during incremental edge additions in graphs.
- Kruskal minimum spanning tree edge filtering.
- Group merging problems such as account linking and clustering.
- Why it matters: mapping a problem to DSU quickly is a strong interview signal.
Links
- Disjoint set data structure Wikipedia
- Disjoint set union CP Algorithms
- Union-Find (Princeton Algorithms) — Sedgewick's implementation walkthrough with complexity analysis and practical variants.
Whats next