Disjoint Set Union-Find

Intro

Disjoint Set Union-Find (DSU) is a data structure that tracks a collection of elements partitioned into disjoint (non-overlapping) sets. It supports two operations efficiently: find(x) — which set does element x belong to? — and union(a, b) — merge the sets containing a and b. With two optimizations (union by rank + path compression), both operations run in near-constant amortized time O(α(n)), where α is the inverse Ackermann function — effectively O(1) for any practical input size.

DSU is the backbone of Kruskal's minimum spanning tree algorithm and appears in network connectivity, image segmentation, and cycle detection.

How It Works

Each element starts as its own set, represented as a tree where each node points to its parent. The root of a tree is the representative (canonical element) of its set.

find(x) — walk up the parent chain until reaching the root. With path compression, rewire every visited node to point directly to the root, flattening the tree for future queries.

union(a, b) — find the roots of both elements. If different, attach one tree under the other. With union by rank, attach the shorter tree under the taller one to keep trees flat.

graph TD
  A[find x] --> B{parent of x is x}
  B -->|Yes| C[return x]
  B -->|No| D[find parent of x]
  D --> E[Set parent of x to root]
  E --> F[return root]

  G[union a b] --> H[Compute ra from find a]
  H --> I[Compute rb from find b]
  I --> J{ra equals rb}
  J -->|Yes| K[already in same set]
  J -->|No| L{rank compare}
  L -->|ra smaller| M[Set parent ra to rb]
  L -->|rb smaller| N[Set parent rb to ra]
  L -->|equal| O[Set parent rb to ra and increase rank of ra]

C# Implementation

public class DisjointSet
{
    private readonly int[] _parent;
    private readonly int[] _rank;

    public DisjointSet(int n)
    {
        _parent = Enumerable.Range(0, n).ToArray(); // each element is its own root
        _rank = new int[n];
    }

    public int Find(int x)
    {
        if (_parent[x] != x)
            _parent[x] = Find(_parent[x]); // path compression
        return _parent[x];
    }

    public bool Union(int a, int b)
    {
        int ra = Find(a), rb = Find(b);
        if (ra == rb) return false; // already in the same set

        // Union by rank: attach smaller tree under larger
        if (_rank[ra] < _rank[rb])      (ra, rb) = (rb, ra);
        _parent[rb] = ra;
        if (_rank[ra] == _rank[rb])     _rank[ra]++;
        return true;
    }

    public bool Connected(int a, int b) => Find(a) == Find(b);
}

Application — Kruskal's MST

Kruskal's algorithm builds a minimum spanning tree by greedily adding the cheapest edge that does not form a cycle. DSU detects cycles in O(α(n)):

public static List<(int u, int v, int w)> KruskalMST(
    int n,
    List<(int u, int v, int w)> edges)
{
    edges.Sort((a, b) => a.w.CompareTo(b.w)); // sort by weight
    var dsu = new DisjointSet(n);
    var mst = new List<(int, int, int)>();

    foreach (var (u, v, w) in edges)
    {
        if (dsu.Union(u, v)) // false if u and v are already connected
            mst.Add((u, v, w));
        if (mst.Count == n - 1) break; // MST complete
    }
    return mst;
}

Complexity

Operation Without optimizations With path compression + union by rank
find O(n) worst case O(α(n)) amortized ≈ O(1)
union O(n) worst case O(α(n)) amortized ≈ O(1)
Space O(n) O(n)

α(n) is the inverse Ackermann function — it grows so slowly that α(n) ≤ 4 for any n that fits in the observable universe.

Questions

Pitfalls

Path compression invalidates parent-array snapshots: Path compression rewires _parent entries on every Find call. If you copy the _parent array to implement undo or rollback, those snapshots become stale after any Find. For rollback-capable DSU (used in offline algorithms), use union by rank only — no path compression — and maintain an explicit undo stack of (node, oldParent, oldRank) tuples.

Rank vs size confusion: _rank approximates tree height; a _size field tracks element count. Both achieve O(log n) height when used alone, but they serve different purposes. Mixing them — for example, using a size field but calling it rank — silently degrades efficiency (size can grow faster than rank would, producing taller trees). Choose one consistently and document the invariant.

Integer index assumption: The standard implementation maps elements to integers 0…n-1. Using non-integer keys requires a separate Dictionary<T, int> to assign indices before construction. Forgetting this indirection causes index-out-of-range errors at runtime, not compile time.

Tradeoffs

Union by rank vs union by size: Both bound tree height at O(log n) without path compression and achieve O(α(n)) together with it. Union by size is often preferred because element count is a natural quantity and enables O(1) set-size queries via _size[Find(x)]. Union by rank is marginally simpler when size queries are not needed. Choose based on whether downstream code needs to know partition sizes.

Path compression variants: Three strategies achieve the same O(α(n)) amortized bound but differ in write volume:

DSU vs adjacency-list union-find: DSU only answers connectivity queries, not path or degree queries. If you need to know the actual path between two nodes, maintain an adjacency list alongside DSU. DSU is the right tool when you only need "are a and b connected?" in O(α(n)) — not when you need to reconstruct how.

References


Whats next

Parent
Algorithms