In an earlier post, I looked at a closed-bucket hashtable implementation based around Dictionary<TKey, LinkedList<TValue>>. Although this achieved the goals of having fast insertion, deletion and retrieval time, it also had a few shortcomings:
- The collection is unsorted and cannot be sorted because it is orderless (the only way to guarantee sorting is to insert pre-sorted elements in order)
- Searching for a value (whose key is unknown) is O(n) due to the use of linked lists as a bucket structure
I decided to revisit the CBHT class. Firstly, I wanted to refactor the code to derive a common base class. Then, extending this class, I wanted to provide an alternative implementation that was sorted (using a 2-tier sort; first by key and then by value) and provided faster searches.
The base type is a generic abstract class that conceals a lot of repeated functionality that arises from implementing multiple collection interfaces. It also hides the explicit interface implementations (which provide compatibility when the class is used in a less specific capacity). The result is considerably fewer methods and properties to implement in the concrete classes; only the following are abstract:
- this[TKey key]
- Add(TKey key, TValue value)
- ContainsKey(TKey key)
- Remove(TKey key)
The sorted CBHT implementation must be based around two sorted types; one to represent the collection of buckets and one to represent the structure of the buckets themselves.
There are two built-in collections in the .NET Framework that are sorted by key; SortedList<TKey, TValue> and SortedDictionary<TKey, TValue>. Both classes are implemented using BSTs and both have similar characteristics. I have opted for SortedList<TKey, TValue> because:
- It is conceptually closer to the array model for hashtables
- The Keys and Values properties are indexed and aren’t regenerated when the collection changes
- In my own tests with large numbers of randomly-generated keys, it outperforms SortedDictionary<TKey, TValue>
For the bucket structure, I have a ready-made solution in the form of the generic BST class I wrote for the last post. It is implicitly sorted by value and searches faster than a linked list.
Each of the two CBHT implementations has its strengths and weaknesses; the sorted version is much faster at searching, but slower at insertion. The unsorted version is faster at insertion and retrieval, but cannot be sorted and is much slower at searching. It is worthwhile to note that tests using large numbers of random keys and values showed that the bottleneck caused by searching in the unsorted version is greater than the performance defecit caused by insertion in the sorted version. Each therefore has its own use in future projects.
SortedCBHT.cs (includes dependant classes)