Thursday, 12 December 2013

How ConcurrentHashMap works internally in Java?

Internal process of ConcurrentHashMap 

Question : How did ConcurrentHashMap achieve it's scalability and improved performance over synchronized HashMap and Hashtable? How it achieves it's thread-safety?

Answer : 



java.lang.Object
·      java.util.AbstractMap<K,V>
·      java.util.concurrent.ConcurrentHashMap<K,V>
         Type Parameters:
K - the type of keys maintained by this map
V - the type of mapped values
            All Implemented Interfaces:
   Serializable, ConcurrentMap<K,V>, Map<K,V>


           public ConcurrentHashMap() :- Creates a new, empty map with a default initial capacity (16), load factor (0.75) and concurrencyLevel (16).




            public ConcurrentHashMap(int initialCapacity,

                 float loadFactor,

                 int concurrencyLevel):-  Creates a new, empty map with the specified initial capacity, load factor and concurrency level.
Parameters:

initialCapacity - the initial capacity. The implementation performs internal sizing to accommodate this many elements.

loadFactor - the load factor threshold, used to control resizing. Resizing may be performed when the average number of elements per bin exceeds this threshold.

concurrencyLevel - the estimated number of concurrently updating threads. The implementation performs internal sizing to try to accommodate this many threads.

Throws:
IllegalArgumentException - if the initial capacity is negative or the load factor or concurrency Level are non positive.


The maximum size it can go up to is 2 power 16. Higher the concurrency level, higher would be the no. of segments and lesser the size of hash table that the segment manages. Using a significantly higher value than we will waste space and time, and a significantly lower value can lead to thread contention/competition.

Now a days ConcurrentHashMap is one of the most popular concurrent collection class in Java.
Its an alternative to Hashtable or synchronized  HashMap in java. 

A ConcurrentHashMap is divided into multiple number of segments and the default is16 on initialization. Similarly ConcurrentHashMap allows 16 threads to access these segments concurrently so that each thread work on a specific segment during high concurrency. 

ConcurrentHashMap offered all the better features as compare to Hashtable and ConcurrentHashMap’s accomplish this by using a simple mechanism i.e. instead of a keeping a wide lock on map, the collection maintains a list of 16 locks by default, each of which is used to guard (or lock on) a single bucket of the map. So, this effectively means that 16 threads can modify the collection at a single time as long as they’re all working on different buckets.

So this hash table is supporting full concurrency of retrievals with resizable array of hash buckets, each consisting of List of HashEntry elements. Instead of a single collection lock, ConcurrentHashMap uses a fixed pool of locks that form a partition over the collection of buckets. This class follows the same functional specification as Hashtable, and includes versions of methods corresponding to each method available in Hashtable

However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.  

                                                                                    There is a static final inner class named HashEntry defined inside  ConcurrentHashMap.  The code snippet is defined below:-


static final class HashEntry<K,V> {
        final K key;
        final int hash;
        volatile V value;
        final HashEntry<K,V> next;

        HashEntry(K key, int hash, HashEntry<K,V> next, V value) {
            this.key = key;
            this.hash = hash;
            this.next = next;
            this.value = value;
        }


This(HashEntry) class takes advantage of final and volatile variables to reflect the changes to other threads running parallel without acquiring the expensive lock for read operations. 

Synchronizing the whole map fails to take advantage of a possible optimisation because hash maps store their data in a series of separate buckets, and to lock only the portion of the map that is being accessed will give adavantage. This optimization is generally called lock striping

The basic strategy is to subdivide the table among multiple segments so that the lock is applied only on a segment rather than the entire table. Each segment manages its own internal hash table in size 2x>=(capacity/no. of segments). 

Locking is applied only for updatation. In case of of retrievals, it allows full concurrency, retrievals reflect the results of the most recently completed update operations. Combination of lock striping and prudent use of volatile variables is making ConcurrentHashMap, an excellent candidate to be used in multithreading environment.
           
Two main features of  ConcurrentHashMap is:
                       
  •   While writing on  ConcurrentHashMap, locks only a part of the map
  •   Reads can generally happened without locking.

An another static final inner class named Segment is defined in  ConcurrentHashMap. The key-value pair table of ConcurrentHashMap, is divided among Segments which extends Reentrant Lock and implements Serializable, each of which(Segments) itself is a concurrently readable hash table. Each segment uses single lock to consistently update its elements flushing all the changes to main memory.
                         
                      ConcurrentHashMap doesn’t allow NULL values. The key can’t be NULL, it is used to locate the segment index and then the actual bucket within the segment. The Hash value of ConcurrentHashMap is used to locate both segment and hash table index. The upper bits will be used to locate the segment index and the lower bits to locate the table index within the segment.
           
           Put If Absent : This method of ConcurrentHashMap is similar to put, except that the value will not be overridden if key already exists. This method is bit faster than the below piece of code as it avoids double traversing. It goes through the same internal flow as put, avoids overriding the old value if key already exists and simply returns the old value.

if (!map.containsKey(key))   
       return map.put(key, value);
else   
       return map.get(key);

This above code is equivalent to putIfAbsent except that the action is performed atomically. As containsKey is not synchronized so the above code may cause unexpected behaviour in multithreaded environment.
                
              put() method of ConcurrentHashMap holds the bucket lock for the duration of its execution and doesn't necessarily block other threads from calling get() operations on the map. First, it will search the appropriate hash value for the given key and if found, then it simply updates the volatile value field. Otherwise it creates a new HashEntry object and inserts it at the head of the list. Iterator returned by  ConcurrentHashMap is fail-safe.



Question : What is the difference between fail-fast iterator and fail-safe iterator?

Answer : fail-fast :  As the name implies, fail-fast iterator fail as soon as they believe that  structure of the collection has been changed since iteration has started. Change in structure means insertion, deletion and updation of any element from Collection, while one thread is iterating over that collection. This(fail-fast) behavior is implemented by keeping a modification count and if iteration thread realizes the change in modification count it throws ConcurrentModificationException.

              fail-safe : If Collection is modified structurally while one thread is Iterating over it then fail-safe iterator doesn't throw any Exception because they work on clone of Collection instead of original collection and the reason they are known as fail-safe iterator. Iterator of CopyOnWriteArrayList, CopyOnWriteArraySet and ConcurrentHashMap are fail-safe and never throw ConcurrentModificationException in Java.

Question : Is this possible for two threads to update the ConcurrentHashMap simultaneously?

Answer : Yes, its possible to have two parallel threads writing to the ConcurrentHashMap at the same time. As per the default implementation of ConcurrentHashMap, atmost 16 threads can write & read in parallel. But the worst case is, if the two objects lie in the same segment or partition of ConcurrentHashMap, then parallel write would not be possible.

Question : Can multiple threads read from a given Hashtable concurrently ?

Answer : No, get() method of hash table is synchronized (even for synchronized HashMap). So only one thread can get value from it at any given point of time. Full concurrency for reads is possible only in ConcurrentHashMap via the use of volatile.


ConcurrentHashMap example:- 



package com.gaurav.collection;

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class ConcurrentHashMapTest {

      public static void main(String[] args) {

            // ConcurrentHashMap implementation details
            Map<String, String> concurrentHMap = new ConcurrentHashMap<String, String>();
            concurrentHMap.put("FIRST", "1");
            concurrentHMap.put("SECOND", "2");
            concurrentHMap.put("THIRD", "3");
            concurrentHMap.put("FOURTH", "4");
            concurrentHMap.put("FIFTH", "5");
            concurrentHMap.put("SIXTH", "6");
            System.out.println("ConcurrentHashMap before iteration : " + concurrentHMap);
            Iterator<String> ccHMapIterator = concurrentHMap.keySet().iterator();

            while (ccHMapIterator.hasNext()) {
                  String key = ccHMapIterator.next();
                  if (key.equals("FOURTH"))
                        concurrentHMap.put(key + "-GAURAV", "I AM NEW KEY IN CCHMAP");
            }
            System.out.println("ConcurrentHashMap after iteration : " + concurrentHMap);

            System.out.println("**************************************");
            // HashMap implementation details
            Map<String, String> hMap = new HashMap<String, String>();
            hMap.put("FIRST", "1");
            hMap.put("SECOND", "2");
            hMap.put("THIRD", "3");
            hMap.put("FOURTH", "4");
            hMap.put("FIFTH", "5");
            hMap.put("SIXTH", "6");
            System.out.println("HashMap before iteration : " + hMap);
            Iterator<String> hashMapIterator = hMap.keySet().iterator();

            while (hashMapIterator.hasNext()) {
                  String key = hashMapIterator.next();
                  if (key.equals("FOURTH"))
                        hMap.put(key + "-SHIVAM", "I AM NEW KEY IN HMAP");
            }
            System.out.println("HashMap after iteration : " + hMap);
      }
}

Result:-



ConcurrentHashMap before iteration : {THIRD=3, FIRST=1, SIXTH=6, FOURTH=4, FIFTH=5, SECOND=2}
ConcurrentHashMap after iteration : {THIRD=3, FIRST=1, SIXTH=6, FOURTH-GAURAV=I AM NEW KEY IN CCHMAP, FOURTH=4, FIFTH=5, SECOND=2}
**************************************
HashMap before iteration : {FIFTH=5, THIRD=3, FOURTH=4, SECOND=2, SIXTH=6, FIRST=1}
Exception in thread "main" java.util.ConcurrentModificationException
      at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
      at java.util.HashMap$KeyIterator.next(HashMap.java:828)
      at com.gaurav.collection.ConcurrentHashMapTest.main(ConcurrentHashMapTest.java:43)




ConcurrentHashMap advantages summary
  •  ConcurrentHashMap allows concurrent read and thread-safe update operation. All operations of ConcurrentHashMap are thread-safe.

  •  With update operation, ConcurrentHashMap only lock a specific part of Map instead of whole Map.
  • Concurrent update is achieved by internally dividing Map into small segments which is defined using concurrency level.
  • Iterator returned by ConcurrentHashMap is fail safe and never throw ConcurrentModificationException. 

  • ConcurrentHashMap doesn’t allow null as key or value.


10 comments:

  1. Thanks good article...here another blog also explianed good please go through this blog http://adnjavainterview.blogspot.in/2014/06/how-hashmap-works-internally-in-java.html

    ReplyDelete
  2. HashMap works on the principle of hashing. In order to understand the working of HashMap one has to understand hashing.
    Below link can be useful to find out the working of HashMap

    How HashMap works in Java

    http://newtechnobuzzz.blogspot.com/2014/07/how-hashmap-works-in-java.html

    ReplyDelete
  3. what will happend when two threads update the ConcurrentHashMap simultaneously at same segment/bucket??

    ReplyDelete
  4. get method of ConcurrentHashMap is not synchronized, hence i think multiple threads can do the concurrent read operation.

    ReplyDelete
  5. as you mentioned "With update operation, ConcurrentHashMap only lock a specific part of Map instead of whole Map.",
    specific part : suppose there are two parts,
    Thread1 access part1 and Thread2 access part2
    if any T1 inserts a new entry , how Thread2 will be knowing. please help us to undrestand better. Thank you

    ReplyDelete
    Replies
    1. GOod question

      Delete
    2. Here HashEntry is volatile variable so it will reflet other threads immediatly.

      Delete
  6. What happen at the time of rehashing in concurrentHashMap?

    ReplyDelete
  7. latif mohammad khan, how in normal case thread know to other work?
    Thread 2 is working on it own segment.Why thread2 want to know it?

    ReplyDelete