The search techniques that have been dealt with so far compared the keys and retrieved the element when amatch was found. Another approach is to compute the location of the desired element. For e.g., if there is a setof ‘n’ student records, each of which is identified by a unique value that lies between 1 and n, this unique numberis used to identify the records. The key value of a particular student often used as a subscript specifies thelocation of a record.This key to address transformation is defined as hashing.
HASHING FUNCTIONS ARE DISTRIBUTED INTO TWO CLASSES
DISTRIBUTION DEPENDENT
DISTRIBUTION INDEPENDENT
A distribution dependent hashing function is obtained by examining the subset of keys corresponding to known records.
On the other hand, a distribution independent does not use the distribution of the keys of a list in computingthe position of the record.
HASH METHODS
The hashing methods that are mentioned below are restricted to the distribution independent hashing functionstype. The major assets of these functions include speed and generation of addresses uniformly. A few of the mostpopular methods are:
Mid-square method
The folding method
Digit analysis method
Length dependent method
Mid-square method: It is one of the most popularly used methods and can be implemented in quite a few applications. The modusoperandi of this method is that a key multiplies by itself and the address is obtained by selecting an apt numberof digits from the middle of the square. The number of bits or digits chosen relies on the size of the list.
The Folding method: In the folding method, a key is divided in such a way that all the parts maintain the same length to that of therequired address with the possible exception of the last part. The individual parts are then summed up togetherignoring the final carry to form an address. If the keys are in binary form the "exclusive or" operation can besubstituted for addition.
Digit analysis method: In the digit analysis method, selecting and shifting the digits or bits of the original key form the addresses. Thismethod is also one of the distribution dependant types. For e.g. a key 9876545 is transformed to the address4567 by selecting the digits in positions 3 to 6 and reversing their order.
Length dependent method: Another hashing method, which is commonly used, is the length dependent method. In this method, the lengthof the key is used along with some portion of the key to produce an intermediate key.
Hash tables
We have weighed the advantages of ordered versus unordered lists and found that ordered lists like the DS Array have fast search capability while unordered lists like the linked list have fast update capability. It would be nice to have a data structure that features both advantages. Does one exist? Yes! Two standard data structures are worthy of study: the hash table and the binary tree.
What is a hash table?
A hash table is an array of linked lists. It attempts to make up for the disparities of the linked list and the array by combining the best of both to provide fast insert, delete and find, but unfortunately not sorted access to the data.
A hash table uses a key field to associate data with which it "hashes" or maps to an integer slot in an array known as a bucket. A bucket is an array cell consisting of a linked list of objects.
Rehashing
An intelligent hash table monitors its load, the ratio of total number of elements to buckets. If this number exceeds a certain threshold percentage, say 75%, then the hash table should reorganize or rehashitself, thus increasing the total number of buckets and refitting all existing elements into the new list.
Here is an example: Suppose our hash table stores names in three buckets. The first bucket stores names whose first character ranges from A..G, the second from H..O, the third from P..Z:
A..G H..O P..Z
------ ----------------
| 0 | 1 | 2 | As it stands, the table is
------ ------- -------- severely overloaded:
| | |
Bernie Samson Sally load = total elements/# buckets
| | | = 15/3
Alfred Iris Robert = 5
| | |
Fred Karla Tony = 500 % !!
| | |
Connie Ozna Tia At most we want a 75% load
| |
Herbert Tanya
|
Teisha
Choosing the number of buckets
In order to maintain an efficient hash table, we wish to preserve the most even distribution of elements as we can throughout the linked lists throughout the lifetime of the hash table. The worst case is that all the elements hash into one bucket, the best is that we have perfect distribution of elements amongst linked lists: