Data Structures, Algorithms, & Applications in C++
Hashing--Probing Strategies
Copyright 2004 Sartaj Sahni

Random Probing
Quadratic Probing
Double Hashing
Performance
Exercises

Some alternatives to the linear open addressing method developed in the text are described below.

Random Probing
Suppose that the hash table has b buckets. In linear open addressing the buckets are examined in the order (f(k)+i) % b, 0 <= i < b, where k is the key of the element being searched for. In random probing a pseudo-random number generator is used to obtain a random sequence R(i), 1 <= i < b where R(1), R(2), ... R(b-1) is a permutation of [1, 2, ..., b-1]. The buckets are examined in the order f(k), (f(k)+R(i)) % b, 1 <= i < b.

Quadratic Probing
The bucket examination order for quadratic probing is f(k), (f(k)+i2) % b, (f(k)-i2) % b, 1 <= i < (b-1)/2, where b is the number of buckets in the table. This examination sequence covers the space of buckets whenever b is a prime number of the form 4j+3, where j is an integer (see ``The use of quadratic residue research,'' by C. Radke, Communications of the ACM, 1970, 103-105). Some suitable values for b are 251 (j = 62), 503 (j = 125), and 1019 (j = 254).

When, b is a prime number that is not of the form 4j+3, we may examine the buckets in the order (f(k)+i2) % b, 1 <= i < b. It can be shown that the first b/2 buckets examined are different. Therefore, provided the table is at most half full, an insert operation will succeed. In other words, we can guarantee an insertion provided the loading factor, alpha is at most 0.5.

Double Hashing
In double hashing we employ a primary hash function f1 and a secondary hash function f2. The primary hash function is used to determine the home bucket h = f1(k) for the search key k. The remaining buckets are examined using a stride s = f2(k). That is, the buckets are examined in the sequence h, h + s, h + 2s, h + 3s, ..., h + (b-1)s (all computations are done modulo the number of buckets b).

Notice that linear open addressing examines the buckets using the stride s = 1.

Using s = 0 doesn't do you any good because with a stride of zero, only the home bucket is examined. For double hashing to work properly, it is necessary that the stride be such that the sequence h, h + s, h + 2s, h + 3s, ..., h + (b-1)s covers all bucket indexes (i.e., we want all buckets to be examined). The examination of all buckets is assured by choosing a stride s that is relatively prime to the number b of buckets.

When b = 11, we may use any of 1, 2, 3, ..., 10 as the stride. For example, if we use s = 4, the buckets are examined in the sequence h, h + 4, h + 8, ..., h + 40 (modulo 11 of course). When h = 3, the sequence is 3, 7, 0, 4, 8, 1, 5, 9, 2, 6, 10.

A common choice for the secondary hash function is f2(k) = p - (k % p), where p is a prime number less than the number b of buckets in the table. Notice that 0 < f2(k) <= p.

Performance
The Un and Sn formulas for random probing were derived in the text. Although, accurate formulas for quadratic probing and double hashing have not been developed, their expected performance seems to governed by the formulas for random probing. That is, the expected performance for quadratic probing and double hashing is given by the equations:

Un = 1/(1-alpha)
S = -(loge(1-alpha))/alpha
where alpha is the loading factor.

Even though the alternatives described above are expected to examine a fewer number of buckets than is examined by the linear open addressing scheme discussed in the text, these schemes take more time per bucket examined (because these schemes require more work to determine which bucket to look at next). Therefore, at smaller values of the loading factor, these schemes are not expected to outperform linear open addressing. For example, when the loading factor is 0.5, an unsuccesful search of a linear open addressed table is expected to examine 2.5 buckets. When one of the alternatives considered here is used, this expected number drops to 2.0. If the additional overhead incurred by the alternative schemes is more than half the time linear open addressing takes to examine a bucket, these alternative schemes will be slower.

Exercises
  1. Write a computer program to verify that quadratic probing examines all buckets in a hash table with b = 251, 503, 1019 buckets.
  2. Assume that double hashing is used with 11 buckets. Write the sequence of bucket examinations when the home bucket is 5 and the stride is 6.
  3. Show that whenever the stride s is relatively prime to the number b of buckets, the sequence h, h + s, h + 2s, h + 3s, ..., h + (b-1)s (all computations are done modulo b) examines every bucket in the table. h is any integer in the range 0 through b - 1.
  4. Show that the choice f2(k) = p - (k % p), where p is a prime number less than the number b of buckets in the table ensures that the sequence h, h + s, h + 2s, h + 3s, ..., h + (b-1)s (all computations are done modulo b) examines every bucket in the table. h is any integer in the range 0 through b - 1.
  5. Implement a hash table that uses double hashing. Compare the performance of your table with that of the linear open addressed table developed in the text.