Page "Hash function" Paragraph 12

« »

When storing records in a large unsorted file, one may use a hash function to map each record to an index into a table T, and collect in each bucket T a list of the numbers of all records with the same hash value i. Once the table is complete, any two duplicate records will end up in the same bucket.

The duplicates can then be found by scanning every bucket T which contains two or more members, fetching those records, and comparing them.

With a table of appropriate size, this method is likely to be much faster than any alternative approach ( such as sorting the file and comparing all consecutive pairs ).

Page 1 of 1.

1.795 seconds.

Most text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply.