Algorithms 1 - Week 6 - Hash Tables & Bloom Filters

Hash Tables
Bloom Filters

Not that much of note, but having finished the quiz and programming assignment for this week, only the final exam remains to complete the course.

Nice refresher on data structures and algorithms, looking forward to part 2 which with greedy algorithms and dynamic programming should tie in nicely with another I’m currently studying up on, namely reinforcement learning.

Hash Tables

Purpose: Fast lookup table by key. O(1) with some caveats.

Operations:

Insert: Add record.

Delete: Delete record.

Lookup: Find record by key.

Hash Functions

Two steps;

Hash code: Converts string to a big number
Compression function: Compress hash code to fit the allocated array size.

Size of array (for mod n compression function)

Choose n to be a prime number in order to distribute data more evenly (avoid $mod(n)$ being a trivial factor).
Not too close to be a power of two.
Not too close to a power of ten.

Load Factor

$\alpha$ : # of objects in hash table, divided by the # of buckets of hash table.

Pathological Data Sets

For every hash function, it is possible to construct datasets that lead to a lot of collisions. Thus, there does not exist a universal best hashing function but can be very contextual depending on how the data being hashed looks.

Bloom Filters

Purpose: Similar to hash tables, but only storing whether a value has been seen, not an associated object. Additionally;

More space efficient
Does allow for false positives
No deletions (by default)

Original usecase: Spellcheckers. If word is not found, then it’s marked as misspelled. Small chance for false positives is not a big deal.

Another possible usecase: List of forbidden passwords.

Modern usecase: Network routers. Limited memory, need to be super-fast.

Hash Tables

Operations:

Hash Functions

Size of array (for mod n compression function)

Load Factor

Pathological Data Sets

Bloom Filters

Comments

Related Posts

Matrix Factorization and Advanced Techniques - Week 6 10 May 2020

Matrix Factorization and Advanced Techniques - Week 5 09 May 2020

DeepFM Paper - Notes 29 Apr 2020