Sunday, November 16, 2014

Internal Working of HashMap<String, Integer> in Java

Internal Working of HashMap in Java

Was reading an blog on how hashMaps work in Java and I was actually surprised to know that it uses the concepts like bucket, chaining and collision in them. I had learned about these concepts in college but thought those books were outdated and now they use some complex techniques for Hashes in Java. So I started a investigation to verify if they were actually using these techniques which i learned in college. Read some articles in wiki, other blogs and went through some youtube videos to actually understand how a HashMap (with key as string) works. Wrote the following code, and debugged it line by line to understand the internals of HashMap.

I will try to explain how a hashMap(key-String) works with debugging the put() and get().
By default the HashMap in Java, is an array(size:16(buckets) and indexed:0,1..15) of Entry<K,V> (K:Key, V:Value).This array is named as table.
Entry is class with key(of type K), value (of type V), hash (int) and 'next' pointing to next Entry in the list. table[i] contains the first Entry of the linkedList for bucket with index i. If the linkedList contains more entries, those can be traced by looping through the `next` of the Entry.

Adding Entry into HashMap : put()
Lets take the example of adding (Key="ef",Value=1) to the HashMap
Steps:
1) The HashCode of the new key("ef") is constructed
  To calcluate the hash for a string java uses the following formula
h(s)=\sum_{i=0}^{n-1}s[i] \cdot 31^{n-1-i}
   For our example the hash for "ef" is 3233
  This hash is modified by the HashMap.class to ensure that hashCodes that differ only by constant      multiples at each bit position have a boundednumber of collisions (approximately 8 at default load      factor).
h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4);

2) The bucket index for the hash is calculated, i = h & (length-1);
3) The bucket[i](array of Entry) is checked to see if there exists an entry with the same hash and same key.
If such an entry exists, then value of the found entry is returned
else the new entry is inserted as the head of the linkedList for bucket[i]
( Note : before inserting the new entry, resizing of map is done if number of keys in this map reaches its threshold )

Retrieving a key's value: get()
Steps
1)  Calculate Hash for String/Key
2)  Find Bucket index,i
3) Loop through bucket[i] to find a Entry with hash = Key.hash and same key
if found: return Entry
else: return null

Diagram shows the structure of HashMap
Note : two entries are mapped into table[2], "ef" and "fG" because they have the same hash. The newly added entry becomes the head of the linkedList

No comments:

Post a Comment