Hashing
 under construct'n…plz visit latr

as searching for an element in a linked list—Θ(n) time in the worst case—in practice, hashing
performs extremely well. Under reasonable assumptions, the expected time to search for an
element in a hash table is O(1).
A hash table is a generalization of the simpler notion of an ordinary array. Directly addressing
into an ordinary array makes effective use of our ability to examine an arbitrary position in an
array in O(1) time.
direct addressing is in-effectiv..So v go 4 hash tables

return T [k]
T[key[x]] ← x
T[key[x]] ← NIL

the storage requirements for hash-table can be reduced to Θ(|K|) while we maintain the benefit that searching
for an element in the hash table still requires only O(1) time. The only catch is that this bound
is for the average time, whereas for direct addressing it holds for the worst-case time.

There is one hitch: two keys may hash to the same slot. We call this situation a collision.
Fortunately, there are effective techniques for resolving the conflict created by collisions.

#### Collision resolution by chaining

Collision resolution by chaining. Each hash-table slot T[j] contains a linked list
of all the keys whose hash value is j. For example, h(k1) = h(k4) and h(k5) = h(k2) = h(k7).

CHAINED-HASH-INSERT(T, x)
insert x at the head of list T[h(key[x])]
CHAINED-HASH-SEARCH(T, k)
search for an element with key k in list T[h(k)]
CHAINED-HASH-DELETE(T, x)
delete x from the list T[h(key[x])]

break cormen 196page 11.3 hash fns 199page
Two of the schemes, hashing by division and hashing
by multiplication, are heuristic in nature, whereas the third scheme, universal hashing, uses
randomization to provide provably good performance.

#### What makes a good hash function?

II A good hash function satisfies (approximately) the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots, independently of where any other key has hashed to. Unfortunately, it is typically not possible to check this condition, since one rarely knows the probability distribution according to which the keys are drawn, and the keys may not be drawn independently.not clr ||
Occasionally we do know the distribution. For example, if the keys are known to be random
real numbers k independently and uniformly distributed in the range 0 ≤ k < 1, the hash
function
h(k) = ⌊km⌋
satisfies the condition of simple uniform hashing.
break
The division method:
A good approach is to derive the hash value in a way that is expected to be independent of any
patterns that might exist in the data. For example, the "division method" computes the hash value as the remainder when the key is divided by a specified prime number. This method frequently gives good results, assuming that the prime number is chosen to be unrelated to any patterns in the distribution of keys.

h(k) = k mod m. (m = num f slots!!)
A prime not too close to an exact power of 2 is often a good choice for m. For example,
suppose we wish to allocate a hash table, with collisions resolved by chaining, to hold roughly
n = 2000 character strings, where a character has 8 bits. We don't mind examining an average

page revision: 7, last edited: 08 Jun 2007 07:04