Data Structures, Algorithms, & Applications in C++
Hashing--Probing Strategies
Copyright 2004 Sartaj Sahni
Random Probing
Quadratic Probing
Double Hashing
Performance
Exercises
Some alternatives to the linear open addressing method developed in the
text
are described below.
Random Probing
Suppose that the hash table has b
buckets.
In linear open addressing the buckets are examined in the order
(f(k)+i) % b,
0 <= i
< b
, where
k
is the key of the element being searched for.
In random probing a pseudo-random number generator
is used to obtain a random sequence R(i),
1 <= i
< b
where R(1)
,
R(2), ...
R(b-1)
is a permutation of [1, 2, ...,
b-1]
. The buckets are examined in the order
f(k)
,
(f(k)+R(i)) % b
,
1 <= i
< b
.
Quadratic Probing
The bucket examination order for quadratic probing is
f(k)
,
(f(k)+i2) % b
,
(f(k)-i2) % b
,
1 <= i
< (b-1)/2
, where b
is the
number of buckets in the table.
This examination sequence covers the space of buckets whenever
b
is a prime number of the form
4j+3
, where
j
is an integer (see ``The use of quadratic residue research,''
by C. Radke, Communications of the ACM
, 1970, 103-105).
Some suitable values for b
are 251
(j = 62
), 503
(j = 125
), and
1019
(j = 254
).
When, b
is a prime number that is not of the form
4j+3
, we may examine the buckets in the order
(f(k)+i2) % b
,
1 <= i
< b
. It can be shown that the first b/2
buckets examined are different. Therefore, provided the table is at most
half full, an insert operation will succeed.
In other words, we can guarantee an insertion provided the loading factor,
alpha
is at most 0.5
.
Double Hashing
In double hashing we employ a primary hash function
f1
and a secondary hash function
f2
.
The primary hash function is used to determine the
home bucket h = f1(k)
for the search key k
.
The remaining buckets are examined using a stride s =
f2(k)
.
That is, the buckets are examined in the sequence
h, h + s, h + 2s, h + 3s, ..., h + (b-1)s
(all computations are done modulo the number of buckets
b
).
Notice that linear open addressing examines the buckets using the
stride s = 1
.
Using s = 0
doesn't do you any good because
with a stride of zero, only the home bucket is examined.
For double hashing to work properly, it is necessary that the stride be such
that the sequence
h, h + s, h + 2s, h + 3s, ..., h + (b-1)s
covers all bucket indexes (i.e., we want all buckets to be examined).
The examination of all buckets is assured by choosing a stride
s
that is relatively prime to the number
b
of buckets.
When b = 11
, we may use any of
1, 2, 3, ..., 10
as the stride.
For example, if we use s = 4
, the buckets are
examined in the sequence h, h + 4, h + 8, ..., h + 40
(modulo 11
of course).
When h = 3
, the sequence is
3, 7, 0, 4, 8, 1, 5, 9, 2, 6, 10
.
A common choice for the secondary hash function
is f2(k) = p - (k % p)
,
where p
is a prime number less than the
number b
of buckets in the table.
Notice that
0 < f2(k) <= p
.
Performance
The Un
and
Sn
formulas for random probing were
derived in the text.
Although, accurate formulas for quadratic probing and
double hashing have not been developed, their
expected performance seems to governed by the formulas
for random probing.
That is, the expected performance for quadratic probing
and double hashing is given by the equations:
Un = 1/(1-alpha)
S = -(loge(1-alpha))/alpha
where alpha
is the loading factor.
Even though the alternatives described above are expected to examine
a fewer number of buckets than is examined by the linear open addressing
scheme discussed in the text, these schemes take more time per
bucket examined (because these schemes require more work to determine
which bucket to look at next).
Therefore, at smaller values of the loading factor, these schemes are
not expected to outperform linear open addressing.
For example, when the loading factor is 0.5
,
an unsuccesful search of a linear open addressed table is expected
to examine 2.5
buckets. When one of the alternatives
considered here is used, this expected number drops to
2.0
. If the additional overhead incurred
by the alternative schemes is more than half the time linear
open addressing takes to examine a
bucket, these alternative schemes will be slower.
Exercises
-
Write a computer program to verify that quadratic probing examines all
buckets in a hash table with
b = 251, 503, 1019
buckets.
-
Assume that double hashing is used with
11
buckets.
Write the sequence of bucket examinations when the home
bucket is 5
and the
stride is 6
.
-
Show that whenever the stride
s
is relatively prime
to the number b
of buckets, the
sequence
h, h + s, h + 2s, h + 3s, ..., h + (b-1)s
(all computations are done modulo b
)
examines every bucket in the table. h
is any
integer in the range 0
through b - 1
.
-
Show that the choice
f2(k) = p - (k % p)
,
where p
is a prime number less than the
number b
of buckets in the table ensures that
the sequence
h, h + s, h + 2s, h + 3s, ..., h + (b-1)s
(all computations are done modulo b
)
examines every bucket in the table.
h
is any
integer in the range 0
through b - 1
.
-
Implement a hash table that uses double hashing.
Compare the performance of your table with that of the linear open
addressed table developed in the text.