Merge Sort

Intro

Merge sort is a divide-and-conquer algorithm: split the array in half, recursively sort each half, then merge the two sorted halves into one. It guarantees O(n log n) time in all cases and is stable, at the cost of O(n) extra memory. It is the algorithm of choice when stability matters or when sorting linked lists and external data.

Mechanism

Recursively split until each partition has size 1 (trivially sorted). Then merge pairs of sorted partitions by repeatedly taking the smaller front element from the two halves.

graph TD
  A[mergeSort A from l to r] --> B{size at most 1}
  B -->|Yes| R[return]
  B -->|No| C[Compute mid]
  C --> D[mergeSort A from l to mid]
  C --> E[mergeSort A from mid plus 1 to r]
  D --> F[merge two sorted halves]
  E --> F
  F --> R

Complexity

Case Time Space
Best O(n log n) O(n)
Average O(n log n) O(n)
Worst O(n log n) O(n)

Properties: stable, not in-place (requires O(n) merge buffer for arrays), O(1) extra space for linked lists.

C# Implementation

public static void MergeSort(int[] a, int left, int right)
{
    if (left >= right) return;

    int mid = left + (right - left) / 2;
    MergeSort(a, left, mid);
    MergeSort(a, mid + 1, right);
    Merge(a, left, mid, right);
}

private static void Merge(int[] a, int left, int mid, int right)
{
    int[] temp = new int[right - left + 1];
    int i = left, j = mid + 1, k = 0;

    while (i <= mid && j <= right)
        temp[k++] = a[i] <= a[j] ? a[i++] : a[j++];

    while (i <= mid)  temp[k++] = a[i++];
    while (j <= right) temp[k++] = a[j++];

    Array.Copy(temp, 0, a, left, temp.Length);
}

When to Use

For in-memory array sorting where stability is not required, Array.Sort (introsort) is typically faster due to better cache behavior.

Pitfalls

Allocating a New Buffer on Every Merge Call

What goes wrong: a naive implementation allocates a new int[] temp buffer inside every Merge call. For n=1,000,000, this creates ~2,000,000 small allocations, causing significant GC pressure and slowing the sort by 2–5×.

Mitigation: allocate a single temp buffer of size n once before the sort begins and pass it through all recursive calls. This reduces allocations from O(n log n) to O(1).

Using Merge Sort When Stability Is Not Required

What goes wrong: merge sort is used for general-purpose sorting where stability is not needed. It is 20–40% slower than quick sort (introsort) in practice due to the O(n) auxiliary array and cache-unfriendly merge step.

Mitigation: use Array.Sort (introsort) for general-purpose in-memory sorting. Use merge sort (or Array.Sort with a stable comparer, or LINQ OrderBy) only when stability is required.

Tradeoffs

Algorithm Time Space Stable Cache Use when
Merge sort O(n log n) all cases O(n) Yes Good Stability required; linked lists; external sort
Quick sort (introsort) O(n log n) avg O(log n) No Excellent General in-memory; fastest in practice
Timsort O(n log n) worst, O(n) best O(n) Yes Excellent Nearly-sorted data; Python/Java default
Heap sort O(n log n) all cases O(1) No Poor O(1) space + O(n log n) worst case both required

Decision rule: use merge sort when stability is required or when sorting linked lists (O(1) extra space on linked lists). For in-memory array sorting without stability requirements, use Array.Sort (introsort). For nearly-sorted data, Timsort (Python/Java default) is optimal.

Questions

References


Whats next