Monday, May 31, 2010

The Java Collection Framework and Immutability

... or a story of things that can change unexpectedly

Have a look at following implementation. Does it what it claims to do?

  /**
   * Selects and returns those entries of the map whose keys are even.
   * @return a set of all entries in the map with even keys
   */
  public static Set<Entry<Integer, String>> extractEvenKeys(Map<Integer, String> map) {
    Set<Entry<Integer, String>> result = new HashSet<Entry<Integer, String>>();
   
    for(Entry<Integer, String> entry: map.entrySet()) {
      if (isEven(entry.getKey())) {
        result.add(entry);
      }
    }
   
    return result;
  }

  private static boolean isEven(int number) {
    return (number & 1) == 0;
  }


The code iterates over all entries of the map and puts the entries into a result set. Does it always work? Let's test it:

 public static void test() {
    Map<Integer, String> map = new HashMap<Integer, String>();
    for(int i=0; i<10; ++i) {
      map.put(i, String.valueOf(i));
    }
    Set<Entry<Integer, String>> even = extractEvenKeys(map);
    System.out.println(even.size());
    for(Entry<Integer,String> entry: even) {
      System.out.print("["+entry.getKey()+","+entry.getValue()+"] ");
    }
  }

This gives the following output:
  [14,14] [12,12] [8,8] [6,6] [4,4] [2,2] [0,0] [10,10] [18,18] [16,16]

We get all ten entries with even keys so the output is correct. The entries have no particular ordering but we did not say anything about this so this is okay. We used a java.util.HashMap for the test but this should work with every correct implementation of the java.util.Map interface, right? So let's try some other maps:

java.util.LinkedHashMap: 
  [14,14] [12,12] [8,8] [6,6] [4,4] [2,2] [0,0] [10,10] [18,18] [16,16]

java.util.TreeMap:
  [14,14] [12,12] [8,8] [6,6] [4,4] [2,2] [0,0] [10,10] [18,18] [16,16]

java.util.concurrent.ConcurrentHashMap:
  [4,4] [14,14] [2,2] [8,8] [0,0] [6,6] [12,12] [10,10] [16,16] [18,18] 

java.util.IdentityHashMap:
  [9,9] [9,9] [9,9] [9,9] [9,9] [9,9] [9,9] [9,9] [9,9] [9,9]

Oops! Something went terribly wrong with the IdentityHashMap! We got a Set that contains ten times the same entry, and it is actually an entry with an odd key! Is the implementation of java.util.HashSet wrong? After all, a set should never contain an element more than once! Or do we have do blame something else?

What went wrong?

Well, as it turns out, we trapped into a documentation hole of the Java Collection Framework. If we look at the Entry interface contained in java.util.Map we see a comment that these Entry objects are only valid as long as the map remains unmodified. What is missing, however, is a documentation how the entry objects can change over time:
  • Are Entry objects immutable? No, they have a setValue method so they cannot be.
  • Do Entry objects remain unchanged as long the Map they belong to remains unchanged? We could think so, but unfortunatley not.
  • Does Entry.getKey() change while iterating over the entries? Unfortunately, not even this is specified, so an implementation is free to choose!
As it turns out, when iterating over the entries of an IdentityHashMap, the same entry object is always returned on Iterator.next(), just the getKey() and getValue() returns a different key and value for each iteration. This also means that equals() and hashCode() of the entry object change and the disaster is complete when we try to put the entries into a HashSet: We end up with a set that has ten times the same entry object (remember the same object is returned for each iteration) and after iterating, the key and value of the entry are those of the last iteration!

All this happend because we misleadingly assumed that the entries' keys would not change. There is a bug report which complains about the behaviour of the IdentityHashMap's entry set. Note that there is the same issue with java.util.EnumMap.

Let's fix it

We cannot change the contract of the interface. So, we have to work around in our implementation. To make our code work with all implementations of java.util.Map  we have to drop the assumption of immutable entry keys and create new Entry objects for the result set:

  public static Set<Entry<Integer,String>> extractEvenKeys(Map map) {
     Set<Entry<Integer,String>> result 
            = new HashSet <Entry<Integer,String>>();

    for(Entry<Integer,String> entry: map.entrySet()) {
      if (isEven(entry.getKey())) {
        result.add(new ImmutableEntry<Integer,String>(
            entry.getKey(), entry.getValue()));
      }
    }
    return result;
  }

whereas we need a custom implementation of Map.Entry which we now clearly mark as immutable:

  @Immutable
  public class ImmutableEntry implements Map.Entry<K,V&gt {

    private final K key;
    private final V value;
   
    public ImmutableEntry(K key, V value) {
      this.key = key;
      this.value = value;
    }

    @Override
    public K getKey() {
      return key;
    }

    @Override
    public V getValue() {
      return value;
    }

    @Override
    public V setValue(V value) {
      throw new UnsupportedOperationException();
    }

    @Override
    public int hashCode() {
      return ((key == null) ? 0 : key.hashCode()) ^
             ((value == null) ? 0 : value.hashCode());
    }

    @Override
    public boolean equals(Object obj) {
      if (this == obj) return true;
      if (!(obj instanceof Map.Entry<?,?>)) return false;
      Map.Entry<?,?> other = (Map.Entry<?,?>)obj;
      if (key == null) {
        if (other.getKey() != null) return false;
      } else if (!key.equals(other.getKey())) return false;
      if (value == null) {
        if (other.getValue() != null) return false;
      } else if (!value.equals(other.getValue())) return false;
      return true;
    }
   
  }

Note that I took care to fullfill the contract of the equals and hashcode methods as specified in Map.Entry.

A final remark

The hashCode method of the entry takes the XOR of the key's and value's hashcode (as required by the documentation of the interface). This can have some serious side effect too: If you -- for whatever reason -- put entries into a map with equal key and value, this means that all these entries will have hashcode zero! Putting these entries into a HashSet will kill your performance as the map degrades to a linked list due to the collisions!

Friday, May 28, 2010

Generic Applications

This week I started to implement a new actifsource feature. A repository for generic actifsource applications. The idea behind this is to provide an easy way to distribute models, frameworks and templates solving common tasks. We decided to provide a special import wizard, which downloads and installs the application. One challenge was to manage the installation of required features and plugins. Our applications may require findbugs or other third party tools.

To solve this issue I decided to manage the information about the features and the updateSiteURIs along which some other information in the repository. This way the wizard can check the current feature list, add missing updateSites und the required dependencies. Since svn does a good job in managing source code, I decided to use the eclipse team project set to manage the download and import of the projects into the workspace.

Since we need a way to define and store additional dependency information, I defined a small actifsource model. One big advantage over using a textfile are the resource formfieldeditor and the actifsource validation, which shows you where the application definition is incomplete.

After this I had to think about from where the generic app import will obtain the data. I decided to implement this as a service. If the service is encapsulated properly the importer doesn't have to care about where the information comes from. Both service data classes, the service interface and the service implementation stub can be generated by actifsource templates. It is also possible to write the wsdl. I first started using the eclipse j2ee wizards, but very soon I found out that it is not very convenient doing the updates by hand. I'm sure this is a good case for a generic application.

Providing templates for building simple java beans (value objects) out of the class defined in actifsource, would be easy. I talked about this in my last post. Using a similar model as shown in the Simple Service Tutorial combined with an enhanced version of the DataClass-Templates will do the job:

The actifsource class



The new template




The generated java bean for the use for data transfer




As you can see we often need some kind of data classes in a specific format. It's also possible to generate the service calls and everything else including the setup code of the service client and server. I think this would be really a useful application for the actifsource generic application store, generating the transfer classes for your simple model. I say simple model, because if you have a more complex environment, you probably won't have a 1:1 relationship between your model classes and the data classes. In this case you will most likely write your own templates doing the transformation/mapping between model and code.

Thursday, May 27, 2010

Grouping Is Domain Specific Too

Complex domains have large models...

As an advanced user of domain-driven modeling you may have experienced that you find yourself soon dealing with a huge set of domain-specific elements: A description of a complex domain can need hundreds of elements. Having all these elements as a flat list is no longer convenient. Fortunately, actifsource lets you group your elements -- which are simply called Resources in actifsource -- into so-called Packages (similar to Java packages).

Packages enable a simple hierarchical structuring and are currently the all-embracing grouping element:
  • A Package can contain Resources
  • A Package can contain nested Packages
In the domain meta-model we use Packages to group our Classes. Packages do a perfect job there. However, when the task is to group the domain elements themselves, we found that packages have a problem: The are not domain-specific! Instead of Packages, it would be desirable to have domain-specific grouping elements... 'FunctionalAreas', 'Epics', 'EntitySpaces'  are possible names. But this name is always specific to the domain so we cannot choose it universally!

 ... and need to be grouped domain-specifically!

So, after some discussion, we came to the following idea to solve the problems mentioned above: We need to have a way to define grouping elements on the domain meta-level! All the grouping concepts have something in common:
  • A grouping element can contain nested grouping elements of some grouping types
  • A grouping element can define dependencies to other grouping elements
Inside Eclipse, the groupings would still be displayed as folders containing elements. But you could fully customise the folders (icons, decorations, ..). Further, the folder would be validated to contain only the correct element types and the dependencies could be used to narrow the scope of references: Resources contained in a grouping are only allowed to have references to other Resources which are in the scope defined by the dependencies on the grouping!

So, the current concept of Packages containing Classes would be only a realization of the more general grouping concept!

Friday, May 21, 2010

Generating for multiple layers

You often need the same structures in different places. You have a data-layer where you need data classes to hold the data, you need a network layer for transportation and you also might have a GUI to display your data. Depending on what you do, you may need to decorate the data with additional information or change its representation. One problem with this is that a new feature must be implemented on all layers to fully support it. With actifsource you can make sure that a feature is handled on all layers and no task gets forgotten. Simply create templates for each layer. Today I want to give you a small example how actifsource can help.

Lets look at the data layer. I generate really simple data classes, they have a getter, setter and field for the value/reference. For this example I don't care about change notification or null checking:



As an example I defined a data class Person in actifsource:



The template that generates the java class using some helper functions "toJavaType" to translate the actifsource types to java:



For the second layer I generate a simple viewer for each data structure. There is no need to define additional information in the model, I just use the same data. For simplicity I don't care about the type of the properties and create a text field for each one. In a real world example you might create components instead of frames and create specific controls for each attribute/relation type.

The following two screenshots show the viewer template and the output for person:





In the viewer template I also added a main-method and the createExampleValue-method with a protected region to create an example value to display. The running example looks like this:



Now I have two layers that will updated every time when I change the model. It is ensured that there is always both a text field in the view and a data field in the data class. Think about a third layer: a network layer with the serialization code is stored or different output format implementations. You even might have implementations or clients/servers in different programming languages (c++, c#, java). In all cases actifsource takes care about consistency, no more trouble due to missing fields or different order in the serialization/deserialization code between the different implementations.

Thursday, May 20, 2010

Generating HashMaps for Primitives

Since long, I have wondered whether it would make a big difference in Java if you use a custom hash map implementation for primitives instead of using e.g. java.util.HashMap with boxed values. However, as there are 7 primitives (byte, char, short, int, long, float, double) this would mean that there are potentially 49 hash map implementations to be written.

In the process of testing the upcoming release of actifsource, I used actifsource templates to generate the implementations so that I had to write a single generic implementation only (similar to a c++ or c# template). While not being a typical use case for model- or domain-driven engineering (the model is trivial, the domain is pure technical) it was an interesing test. Unlike the HashMap form java.util, my generated versions store directly primitives instead of boxed objects. I was particular curious what impact this had on the memory footprint and the performance.

The Model

As I did not want to measure difference in the hashing algorithm I started with the HashMap implementation. This meant copy and pasting the code of java.util.HashMap into an empty template. But first, I created this simple model:
A hash map has a key type and a value type which is an enum of the primitive types. For compatibility with the maps from Java i generated an interface which is similar to these maps, just with primitves in the method signature:

Next, in the implementation, I replaced all reference to the generic types K and V by the type from the hash map class, i.e. the expression HashMap.valuetype.name and HashMap.keytype.name. The other thing that I had to change was the handling of null keys and values. As primitives cannot be null the code for handling null could be removed. This also means, that the value 0 gets returned instead of null when a get is made for an non-existing key. Compare the implementations from java.util.HashMap with the implementation in the actifsource template:



The algorithm itself remained the same. To calculate the hash value I could no longer relay on Object.hashCode() as the primitives are no objects. Instead, I created a utility class HashUtil which contains an overloaded function hashCode for each primitive type.

Performance Results

To benchmark the implementation, I filled a map int -> float with 1'000'000 random key/value pairs (put), replaced the value for all the keys once (update), asked once for each key (get) and finally removed all keys (remove). This was repeated 15 times with the first 5 repetitions used as warm-up and taking the average timing over the remaining 10 repetitions. All figures are in milliseconds. I also measured the size of the memory retained by the maps.

Java HashMap:
put 793
update 428
get 409
remove 432
retained size: 108MB (113bytes per entry)

Generated Int2FloatHashMap:
put 352
update 175
get 87
remove 174
retained size: 54.1MB (56.7bytes per entry)

As you can see the generated hash map performed notably better with the get being more than 4 times faster in average! Also, only half as much memory is used.

Tuesday, May 18, 2010

Software Architecture and Components

Len Bass defines Software Architecture as follows:

Len Bass: Software Architecture in Practice (2003)

The Software architecture of a program or computing System is the structure or structures of the System, which comprise Software components, the externally visible properties of those components and the relationships among them.
Does this help? Maybe... The above definition tells us, that the architecture of a software system comprises components. Obviously the next question is: what are components?

The word component itself is derived from the latin word componens which means something that is composed of several parts. But composed of what parts?

Using object-oriented technologies it seems obvious that components implement a specific interface. So far so good. But what about building a component? Is there a building plan or a specification to see how to create a new component?

In our opinion, the structure of a component is something domain specific. Usually, a complex software system consists of several different component types. A Service Oriented Architecture may specify components of type Service and components of type BusinessObject and the relationship between these components.


No matter how many Services or BusinessObjects our system will be built of, the conceptual component architecture shows us how components are structured and interconnected.

As you can imagine, building up new components is following specific rules. Since components are spread over several tiers in complex software systems, we do have to implement several aspects per component: GUI, persistency, business logic, etc.

Imagine a system where you could define your conceptual component architecture. And imagine a system, where you could specify specific components according the conceptual component architecture.

actifsource allows you to specify your domain-specific conceptual component architecture. Doing so, specific components may be entered along your concept.

Every element is fully type checked at real time. And of source there is a type-sensitive content assist.

But the most thrilling thing comes now: Since we know exactly how a component is structured, we can provide generic code to build up any specific component.

Due to our specification a Service always comprises ServiceCalls, while a ServiceCall comprises Parameters. Knowing these conceptual facts, we can write a class for every Service, which contains a method for every ServiceCall.

But instead of writing a method for every ServiceCall manually we let our computer do the tedious work.

The next picture shows the actifsource template editor. The orange bar on the left hand side indicates the repetition for every specific ServiceCall.



Conclusion
A software system consist of components. Defining the conceptual structure of these components and their relationship allows us to enter well defined and specific components. These component might be checked among their concept.

Since components are well defined, generic code might be written only once for every component type.

Limitations
Note that the above is only true for the structural aspects of a component. Specific algorithms have to be still coded manually. But 3rd GL like Java, C++, etc. are doing a great job for this kind of work.

In our experience, a complex software system contains up to 70% of structural component code which can be generated.

Tuesday, May 11, 2010

Feature Code, Infrastructure Code and Refactoring

Designing a complex software system means beeing prepared for implementing new features over time. For this reason, the software architecture should describe, how to implement new features.

The conceptual part of a software architecture contains instructions how to setup the infrastructure code needed to embedded feature code into your existing software.

Facts on infrastructure code:

  • Built along a concept
  • Mostly created by copy/paste/modify of similar infrastructure code
  • 70% of a complex software system is infrastructure code
  • Refactoring of a software system means adopting infrastructure code
Refactoring of a software means changing the structure of the system without changing it's functional behavior.

Wikipedia defines refactoring as follows:

Code refactoring is the process of changing a computer program's source code without modifying its external functional behavior in order to improve some of the nonfunctional attributes of the software. Advantages include improved code readability and reduced complexity to improve the maintainability of the source code, as well as a more expressive internal architecture or object model to improve extensibility.

-- http://en.wikipedia.org/wiki/Refactoring

A refactoring just improves the structures of your software system.

Because a refactoring does not change the functional behavior, a refactoring does never add business value to a software system.
A refactoring affects infrastructure code, as we stated before.

Generating the infrastructure code makes it possible to automate the refactoring of a complex software system.
Actifsource allows you to formalize the conceptual parts of your software architecture and generate all infrastructure code. This allows to generate up to 70% of a complex software system. It allows also to refactor the generated infrastructure code of your software system automatically.

Stop writing infrastructure code - focus on feature code.

Friday, May 7, 2010

Generating Wizard Classes

Today I give you a short insight into the wizard template. I don’t want to show every detail, but you should see what’s possible. For this blog I assume you already tried our tutorials and you know how to create and register a template.

Let’s build a simple WizardTemplate, to generate the main class for each wizard.



As you can see this empty template has an error at the filename line. This indicates that we have to define a filename otherwise we won’t get any output.
On the filename we put the WizardClass-Instance-Package and the Name. Since packages are separated by a dot, I used the builtin function “package2subdirectory” to replace the dots by slashes.



Now let’s generate the signature. Each wizard class should derive from a GeneratedWizard class, which contains non generated methods use by all wizards. I also added a wizardInterface–method to get the eclipse Wizard-Interface (INewWizard, IImportWizard, IExportWizard) from the model.



This will generate a class for each wizard having the wizards name, extends the class GeneratedWizard and implements a specific Wizard-Interface. At the moment this won’t compile, since there is implementation of the abstract methods. Let’s add some more code.
Pressing “ALT+Insert” allows me to add a new linecontext. I want to have a field for each wizardpage, I added a selector to iterate overall pages of the current wizard:



If you ever wrote a wizard, you know pages are registered by the addPages-Method. The generator should create it and add a line for each wizard page. Since there might some setup code to create the pages, I want a creation method for each page:



For the creation method I add a third context iterating over the pages. Using separate contexts I keep all fields and creation methods together. You might choice another way to group the methods based on your companies coding guidelines:



I added a protected contexts for the constructor call, so it's possible to change the generated code to pass additional parameters later. The code in the resulting protected region won't be touched by future generator runs. Finally I added a add-Method to register the page:



Look at the reference to the wizard field, it's the same text. To see that the fieldName is ment, I extract a function using the quick assist (Ctrl+1):




Via copy paste I replaced all other occurencies:



Using more information from the model, I get some thing like this:



As you can see the template is still readable and there are nothing special with one exception, our links to the model. No for-each loops and template keywords between the code.

The same can be done for each wizard page:



The fieldeditor are registered automatically, so you don't have to write the fields, initialize them and finally create each fieldeditor. The logic for creating the GUI-Controls is in the GeneratedWizardPage-SuperClass, it just call the createControl-method on each registed fieldeditor when eclipse request the page control creation.
You might have discovered the line calling the "depencencySetterName", this line creates the binding between the fieldeditors. As I mentioned in the last blogs the binding is used for the package-fieldeditor to only show the packages in resourcefolder selected by the resourcefolder-fieldeditor. This is really smart, you cannot forget to set the bindings, since the model forces you to do define them and the generator does the rest.

I hope this short insight into the templates help you to understand how actifsource can improve your work.

Wednesday, May 5, 2010

Conceptual and Non-Conceptual Parts of Software-Architecture

What is Software-Architecture? Do you have a widely accepted definition? Aksing 10 different people for a definition, you might get 12 different answers.

In my opinion, Software Architecture can be described as followed:

The software architecture is a directive for the effective and efficient building, understanding and expanding of a software system.

-- Reto Carrara
Ok, that's nothing new so far. So let's talk on the nature of an architecture. A software architecture is a directive or kind of a building plan for a software system. There will be decisions on the choosen n-tier model or the third party libs we plan to use. All this information can be implemented 1:1 in code and is therefore non-conceptual!

But a software architecture describes also, how features of a domain shall be implemented. This kind of information is conceptual!

Defining a Service Oriented Architecture for example, you can be sure, there is a concept in your architecture, how to define new Services. Specifying a Architecture for Complex Event Processing there will be EventHandlers, Events and EventSources.

This conceptual information is one of the most important of our architecture. It tells the developers how to embed new features in the existing structure of the software system.

The concept how to embed new features in your software helps also, to understand the structure of a complex software systems in the maintenance phase.

Having a concept how to build new features for a complex software system is most important. Only a system that is built on a clear concept can be understood and maintained over time.
So, what's the problem?

If one thing is certain in our job: things change - always! Even your basic requirements are subject to change, if your market asks for it.

One of the most basic and most important requirements is therefore:

Software Requirement 1:
Design for Requirements Change!


As we have seen: implementing concepts is most important to read and understand our code later on.

But there is a crux:

Concepts that are implemented many times by their very nature. But concepts that are implemented many times are very hard to change, because every implementation has to be adapted.
So, we need to have concepts to understand the structure of our software system. But implementing concepts prevent us from changing them. Seems to be an unsolvable problem!

Not so, if we implement all concepts automatically by generating the code needed.

actifsource allows you to formalize the conceptual parts of your software architecture to generate the structural feature code. Doing so, concept changes can be done easily just by regenerating existing implementations.

In our experience, structural feature code is about 60-70% of the code of a complex software system.

Imagine to focus on features, not structural code.