under construct'n…plz visit latr |

as searching for an element in a linked list—Θ(n) time in the worst case—in practice, hashing

performs extremely well. Under reasonable assumptions, the expected time to search for an

element in a hash table is O(1).

A hash table is a generalization of the simpler notion of an ordinary array. Directly addressing

into an ordinary array makes effective use of our ability to examine an arbitrary position in an

array in O(1) time.

**direct addressing** is in-effectiv..So v go 4 hash tables

DIRECT-ADDRESS-SEARCH(T, k)

return T [k]

DIRECT-ADDRESS-INSERT(T, x)

T[key[x]] ← x

DIRECT-ADDRESS-DELETE(T, x)

T[key[x]] ← NIL

the storage requirements for hash-table can be reduced to Θ(|K|) while we maintain the benefit that searching

for an element in the hash table still requires only O(1) time. **The only catch is that this bound
is for the average time, whereas for direct addressing it holds for the worst-case time.**

**There is one hitch:**two keys may hash to the same slot. We call this situation a collision.

Fortunately, there are effective techniques for resolving the conflict created by collisions.

#### Collision resolution by chaining

Collision resolution by chaining. Each hash-table slot T[j] contains a linked list

of all the keys whose hash value is j. For example, h(k1) = h(k4) and h(k5) = h(k2) = h(k7).

CHAINED-HASH-INSERT(T, x)

insert x at the head of list T[h(key[x])]

CHAINED-HASH-SEARCH(T, k)

search for an element with key k in list T[h(k)]

CHAINED-HASH-DELETE(T, x)

delete x from the list T[h(key[x])]

*break* cormen 196page 11.3 hash fns 199page

Two of the schemes, hashing by division and hashing

by multiplication, are heuristic in nature, whereas the third scheme, universal hashing, uses

randomization to provide provably good performance.

#### What makes a good hash function?

II A good hash function satisfies (approximately) the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots, independently of where any other key has hashed to. Unfortunately, it is typically not possible to check this condition, since one rarely knows the probability distribution according to which the keys are drawn, and the keys may not be drawn independently.not clr ||

Occasionally we do know the distribution. For example, if the keys are known to be random

real numbers k independently and uniformly distributed in the range 0 ≤ k < 1, the hash

function

h(k) = ⌊km⌋

satisfies the condition of simple uniform hashing.

*break*

**The division method:**

A good approach is to derive the hash value in a way that is expected to be independent of any

patterns that might exist in the data. For example, the "division method" computes the hash value as the remainder when the key is divided by a specified prime number. This method frequently gives good results, assuming that the prime number is chosen to be unrelated to any patterns in the distribution of keys.

h(k) = k mod m. (m = num f slots!!)

A prime not too close to an exact power of 2 is often a good choice for m. For example,

suppose we wish to allocate a hash table, with collisions resolved by chaining, to hold roughly

n = 2000 character strings, where a character has 8 bits. We don't mind examining an average