HashSet
Intro
HashSet<T> stores unique values with O(1) average membership checks. Use it when uniqueness and lookup speed matter more than ordering. A concrete use case: deduplicating 500K event IDs in a message processing pipeline — HashSet<string>.Contains checks each ID in sub-microsecond time, while List<string>.Contains would scan up to 500K entries per check, turning a 200 ms job into a 40-minute one.
Deeper Explanation
HashSet<T> is hash-based: GetHashCode determines the bucket; Equals resolves collisions within a bucket. Core operations (Add, Contains, Remove) are O(1) average.
HashSet<T> also exposes the full set algebra in O(n) time:
UnionWith— adds all elements from another collection (set A ∪ B).IntersectWith— keeps only elements present in both (A ∩ B).ExceptWith— removes elements found in another collection (A − B).SymmetricExceptWith— keeps elements present in exactly one set (A △ B).
These are in-place mutations. IsSubsetOf, IsSupersetOf, and SetEquals test structural relationships without modifying the set. A common production pattern: accumulate processed IDs in a HashSet<int>, then call ExceptWith against the full batch to find unprocessed items in O(n) instead of O(n²).
Structure
graph TD
H1[value dotnet hash] --> B0[bucket zero]
H2[value csharp hash] --> B1[bucket one]
B0 --> V1[dotnet]
B1 --> V2[csharp]Example
var tags = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
"dotnet",
"csharp"
};
var added = tags.Add("DOTNET"); // false, already exists by comparer
Pitfalls
- Broken
GetHashCode/Equalscontract — overridingEqualswithout matchingGetHashCodecausesAddto accept duplicates andContainsto miss existing entries. Two objects that areEqualsmust produce the sameGetHashCode. - Mutable fields in hash computation — mutating a field that participates in
GetHashCodeafter insertion makes the entry unreachable in its bucket.Containsreturnsfalseeven though the entry exists. Use immutable types or never mutate hash-participating fields after adding to the set. - Assuming stable enumeration order —
HashSet<T>enumeration order depends on internal bucket layout and can change afterAdd/Removeoperations or across .NET versions. If you serialize aHashSetand compare output strings, tests will be flaky. UseSortedSet<T>or.OrderBy()when order matters.
Tradeoffs
- Use
HashSet<T>for fast uniqueness checks. - Use
SortedSet<T>if you need sorted uniqueness and accept O(log n) operations.
Questions
HashSet<T> and List<T> for membership checks?
HashSet<T>.Contains is O(1) average; List<T>.Contains is O(n).
HashSet<T>.Contains fail for logically equal objects?
Because hash/equality contracts are broken (for example, mismatched GetHashCode).
Hash-Based Collections Comparison
| Type | Stores | Thread-safe | When to use |
|---|---|---|---|
HashSet<T> |
Values only | No | Unique membership checks, set operations |
Dictionary<TKey,TValue> |
Key-value pairs | No | Key-based lookup |
SortedSet<T> |
Values only | No | Sorted uniqueness, O(log n) ops |
Decision rule: use HashSet<T> when you only need to track membership or perform set operations (union, intersect, except). Use Dictionary when you need to associate a value with each key.
Links
HashSet<T>class — API reference with set operation methods (UnionWith, IntersectWith, ExceptWith).ISet<T>interface — interface contract for set semantics; useful for abstracting over HashSet and SortedSet.- Collections overview and complexity — Microsoft overview of all collection types with complexity guidance.
- HashSet implementation in dotnet runtime — source code for internal bucket and slot layout.
Parent
02 Computer Science
Pages