Monday, May 31, 2010

The Java Collection Framework and Immutability

... or a story of things that can change unexpectedly

Have a look at following implementation. Does it what it claims to do?

  /**
   * Selects and returns those entries of the map whose keys are even.
   * @return a set of all entries in the map with even keys
   */
  public static Set<Entry<Integer, String>> extractEvenKeys(Map<Integer, String> map) {
    Set<Entry<Integer, String>> result = new HashSet<Entry<Integer, String>>();
   
    for(Entry<Integer, String> entry: map.entrySet()) {
      if (isEven(entry.getKey())) {
        result.add(entry);
      }
    }
   
    return result;
  }

  private static boolean isEven(int number) {
    return (number & 1) == 0;
  }


The code iterates over all entries of the map and puts the entries into a result set. Does it always work? Let's test it:

 public static void test() {
    Map<Integer, String> map = new HashMap<Integer, String>();
    for(int i=0; i<10; ++i) {
      map.put(i, String.valueOf(i));
    }
    Set<Entry<Integer, String>> even = extractEvenKeys(map);
    System.out.println(even.size());
    for(Entry<Integer,String> entry: even) {
      System.out.print("["+entry.getKey()+","+entry.getValue()+"] ");
    }
  }

This gives the following output:
  [14,14] [12,12] [8,8] [6,6] [4,4] [2,2] [0,0] [10,10] [18,18] [16,16]

We get all ten entries with even keys so the output is correct. The entries have no particular ordering but we did not say anything about this so this is okay. We used a java.util.HashMap for the test but this should work with every correct implementation of the java.util.Map interface, right? So let's try some other maps:

java.util.LinkedHashMap: 
  [14,14] [12,12] [8,8] [6,6] [4,4] [2,2] [0,0] [10,10] [18,18] [16,16]

java.util.TreeMap:
  [14,14] [12,12] [8,8] [6,6] [4,4] [2,2] [0,0] [10,10] [18,18] [16,16]

java.util.concurrent.ConcurrentHashMap:
  [4,4] [14,14] [2,2] [8,8] [0,0] [6,6] [12,12] [10,10] [16,16] [18,18] 

java.util.IdentityHashMap:
  [9,9] [9,9] [9,9] [9,9] [9,9] [9,9] [9,9] [9,9] [9,9] [9,9]

Oops! Something went terribly wrong with the IdentityHashMap! We got a Set that contains ten times the same entry, and it is actually an entry with an odd key! Is the implementation of java.util.HashSet wrong? After all, a set should never contain an element more than once! Or do we have do blame something else?

What went wrong?

Well, as it turns out, we trapped into a documentation hole of the Java Collection Framework. If we look at the Entry interface contained in java.util.Map we see a comment that these Entry objects are only valid as long as the map remains unmodified. What is missing, however, is a documentation how the entry objects can change over time:
  • Are Entry objects immutable? No, they have a setValue method so they cannot be.
  • Do Entry objects remain unchanged as long the Map they belong to remains unchanged? We could think so, but unfortunatley not.
  • Does Entry.getKey() change while iterating over the entries? Unfortunately, not even this is specified, so an implementation is free to choose!
As it turns out, when iterating over the entries of an IdentityHashMap, the same entry object is always returned on Iterator.next(), just the getKey() and getValue() returns a different key and value for each iteration. This also means that equals() and hashCode() of the entry object change and the disaster is complete when we try to put the entries into a HashSet: We end up with a set that has ten times the same entry object (remember the same object is returned for each iteration) and after iterating, the key and value of the entry are those of the last iteration!

All this happend because we misleadingly assumed that the entries' keys would not change. There is a bug report which complains about the behaviour of the IdentityHashMap's entry set. Note that there is the same issue with java.util.EnumMap.

Let's fix it

We cannot change the contract of the interface. So, we have to work around in our implementation. To make our code work with all implementations of java.util.Map  we have to drop the assumption of immutable entry keys and create new Entry objects for the result set:

  public static Set<Entry<Integer,String>> extractEvenKeys(Map map) {
     Set<Entry<Integer,String>> result 
            = new HashSet <Entry<Integer,String>>();

    for(Entry<Integer,String> entry: map.entrySet()) {
      if (isEven(entry.getKey())) {
        result.add(new ImmutableEntry<Integer,String>(
            entry.getKey(), entry.getValue()));
      }
    }
    return result;
  }

whereas we need a custom implementation of Map.Entry which we now clearly mark as immutable:

  @Immutable
  public class ImmutableEntry implements Map.Entry<K,V&gt {

    private final K key;
    private final V value;
   
    public ImmutableEntry(K key, V value) {
      this.key = key;
      this.value = value;
    }

    @Override
    public K getKey() {
      return key;
    }

    @Override
    public V getValue() {
      return value;
    }

    @Override
    public V setValue(V value) {
      throw new UnsupportedOperationException();
    }

    @Override
    public int hashCode() {
      return ((key == null) ? 0 : key.hashCode()) ^
             ((value == null) ? 0 : value.hashCode());
    }

    @Override
    public boolean equals(Object obj) {
      if (this == obj) return true;
      if (!(obj instanceof Map.Entry<?,?>)) return false;
      Map.Entry<?,?> other = (Map.Entry<?,?>)obj;
      if (key == null) {
        if (other.getKey() != null) return false;
      } else if (!key.equals(other.getKey())) return false;
      if (value == null) {
        if (other.getValue() != null) return false;
      } else if (!value.equals(other.getValue())) return false;
      return true;
    }
   
  }

Note that I took care to fullfill the contract of the equals and hashcode methods as specified in Map.Entry.

A final remark

The hashCode method of the entry takes the XOR of the key's and value's hashcode (as required by the documentation of the interface). This can have some serious side effect too: If you -- for whatever reason -- put entries into a map with equal key and value, this means that all these entries will have hashcode zero! Putting these entries into a HashSet will kill your performance as the map degrades to a linked list due to the collisions!

No comments:

Post a Comment