Thursday, May 20, 2010

Generating HashMaps for Primitives

Since long, I have wondered whether it would make a big difference in Java if you use a custom hash map implementation for primitives instead of using e.g. java.util.HashMap with boxed values. However, as there are 7 primitives (byte, char, short, int, long, float, double) this would mean that there are potentially 49 hash map implementations to be written.

In the process of testing the upcoming release of actifsource, I used actifsource templates to generate the implementations so that I had to write a single generic implementation only (similar to a c++ or c# template). While not being a typical use case for model- or domain-driven engineering (the model is trivial, the domain is pure technical) it was an interesing test. Unlike the HashMap form java.util, my generated versions store directly primitives instead of boxed objects. I was particular curious what impact this had on the memory footprint and the performance.

The Model

As I did not want to measure difference in the hashing algorithm I started with the HashMap implementation. This meant copy and pasting the code of java.util.HashMap into an empty template. But first, I created this simple model:
A hash map has a key type and a value type which is an enum of the primitive types. For compatibility with the maps from Java i generated an interface which is similar to these maps, just with primitves in the method signature:

Next, in the implementation, I replaced all reference to the generic types K and V by the type from the hash map class, i.e. the expression HashMap.valuetype.name and HashMap.keytype.name. The other thing that I had to change was the handling of null keys and values. As primitives cannot be null the code for handling null could be removed. This also means, that the value 0 gets returned instead of null when a get is made for an non-existing key. Compare the implementations from java.util.HashMap with the implementation in the actifsource template:



The algorithm itself remained the same. To calculate the hash value I could no longer relay on Object.hashCode() as the primitives are no objects. Instead, I created a utility class HashUtil which contains an overloaded function hashCode for each primitive type.

Performance Results

To benchmark the implementation, I filled a map int -> float with 1'000'000 random key/value pairs (put), replaced the value for all the keys once (update), asked once for each key (get) and finally removed all keys (remove). This was repeated 15 times with the first 5 repetitions used as warm-up and taking the average timing over the remaining 10 repetitions. All figures are in milliseconds. I also measured the size of the memory retained by the maps.

Java HashMap:
put 793
update 428
get 409
remove 432
retained size: 108MB (113bytes per entry)

Generated Int2FloatHashMap:
put 352
update 175
get 87
remove 174
retained size: 54.1MB (56.7bytes per entry)

As you can see the generated hash map performed notably better with the get being more than 4 times faster in average! Also, only half as much memory is used.

1 comment: