Wednesday, August 25, 2010

BinaryTrees For Rescue

When ArrayLists become big reverse lookups like indexOf, lastIndexOf, contains and the modifications methods can slow down your program. When order is not important, you might use a HashMap. In some other cases you have a constant key-value, that can be use in a TreeMap. If you need to have to update the index however there is no real alternative. What I'm missing in the java collections is a balanced tree, allowing me to traverse it's structure. The NavigableSet-Interface available since Java 6 only allows to traverse the elements. There are no methods to access the tree-structure. I wonder why the tree used in TreeMap wasn't made available in a separate Red-Black-Tree class.

As mentioned before trees can be very useful when search through big datasets. Imagine you have a texteditor with more than 10000 lines of text and you need a mapping from lineNumber to line and line to lineNumber. One possibility is to put all lines as keys in a map and use the index as a value, but than you have to update the map every time, you insert a new line. Using a tree, you can insert the lines at a certain position and put the treenode instead of the index into the map. Now you can lookup up the treenode for a line. Calculating the index of a treenode is not difficult:

To speed up the calculation you can cache the node-count of each subtree in the corresponding rootnode and only recalculate it, when a subtree node changes. This way a lookup of a treenode by it's current index or calculating the a treenode index should be very fast enough.

The calculation of the index can be easy adapted to similar cases. For example if you have a texteditor supporting different line heights, calculating the y-Offset of a single line is expensive. Calculating the y-offset of a line requires to sum up all height of the previous lines. Having all lines in an Arraylist means to iterate through the hole list beginning from the first line. You may think about caching the line-offset in a treemap, but than you have to clear all entries after the modified line. In worst case if the first line changes, the while cache is invalid. With a tree, you can store the sum of heights in the treenodes. If the height of one line changes, you only have to recalculate the value of parent nodes. Caluating the y-Offset is now much cheaper, just sum up the height of the lines in all left-children and the height of the line stored in the node, when following a right-child. The only difference to the index calculation is that, that a treenode adds the height instead of the constant node count 1 when including it's own value.

Comparing the speed of an arraylist with the speed of the tree, isn't an easy task. There are many things to consider:

a) if the dataset is rarely modified and the dataset isn't to big, a map can be used to improve the reverse lookup
b) the speed for an indexOf/lastIndexOf/contains lookup an the arraylist strongly depends on the speed of the equals and hashcode method. For example for integers this is very fast.
c) the number of different elements, if you have only 10 different elements in the list and they are random, the results of indexOf/lastIndexOf/contains will need much less comparisons
d) a tree needs more memory
e) for looking up an element by an index there is nothing faster than an array and same applies for creating complete copies it.

I think the main reason for not including a balanced binary tree in the java collection as a separate class, is that the arraylist in most cases is fast enough. Wether to use a tree, hashmap, arraylist or linkedlist dependends on what you want to do. There is no golden hammer, that solves all your problems.

Sunday, August 8, 2010

DCI and actifsource

In my last post I introduced the modified DCI meta model. Let's have a look at a simple example.

The use case we would like to implement is to transfer money from a money source to a money sink. MoneySource and MoneySink are specific roles, played by an appropriate bank account.

The example is built up on the following entities:

UseCase: TransferMoney
From a users point of view the use case is the most important artifact in the code. That's because the use case contains business relevant code.

The following UseCase shall transfer money from a MoneySource to a MoneySink. Furthermore the amount of money shall be passed to the use case as an argument.

Role: MoneySource and MoneySink
Let's have a closer look at the roles needed. The MoneySource has an interaction balance() to ask for the current balance. Furthermore the role supports withdraw() to withdraw an amount.

Note that we specify that this role might be played by an InvestmentAccount or a SavingsAccount.

MoneySink on the other hand side supports the interaction deposit() to deposit an amount.

Note that this role might be played only by the SavingsAccount.

Data: InvestmentAccount and SavingsAccount
The InvestmentAccount shall have a Balance member to store the actual balance. Furthermore the method withdraw() supports withdrawal. Therefore it might play the MoneySource role.

The SavingsAccount supports withdrawal and deposit. Therefore it might play the MoneySink or MoneySource role.

Context: TransferMoneyContext
The context now puts it all together. Every context has to specify exactly one UseCase. In the following example we choose the TransferMoney UseCase.

Since the UseCase TransferMoney declares the usage of the Roles MoneySource and MoneySink actifsource asks us to provide the data which can play the roles.

Using content assist to selected the data object for the MoneySource role we can choose between the InvestmentAccount and the SavingsAccount. As we have seen before both accounts have a withdraw member and are therefore able to play the MoneySource role. For this example lets choose the InvestmentAccount.

Last but not least we have to specify the data for the MoneySink role. Using content assist on the data field only allows the SavingsAccount. Thats because only the SavingsAccount allows to deposit an amount.

The TransferMoneyContext has been declared for the UseCase TransferMoney. Since TransferMoney stated to work with the Roles MoneySource and MoneySink we have to provide Data which can play this roles.

Thanks to the strong actifsource type system it becomes possible that the content assist allows only Data which can play the role needed.

The DCI Meta Model

In my previous blog I promised to work out a meta model to realize the DCI architecture. As explained before I've made a slight modification: a UseCase is modeled as its own entity.

A UseCase is business relevant code which needs different actors (data objects playing roles) to work on.

The context manages the UseCase an the data objects which are able to play the roles needed.

A role is an interface which can have 1..N interaction (an interaction is realized as an abstract method on the role interface).

Every role might be realized for any suitable data object. A role realization for a specific data object is called Adapter.

Setting up a context it must be determined which data object plays the roles needed.

Data objects contain members and method. Members are simple data fields an result in an appropriate getter function. The method can be freely chosen and realize a more sophisticated access to the members.

The DCI Architecture

In his new book "Lean Architecture: for Agile Software Development" James O. Coplien shows up with a new programming paradigm: The DCI Architecture.

Before reading this blog I suggest to have a look a the DCI vision document:

Let me try to explain DCI in a nutshell.

The idea of DCI is to separate data from business code for better readability while the business code is directly expressed as use case.
DCI stands for Data, Context and Interaction.

Data means POD (plain old data) objects without any business relevant methods.

A context is a place to retrieve and/or setup a necessary data.

An interaction is a complex function applied on the data of the context.

Implementing DCI
Implementing DCI is not an easy task since traits are needed. Coplien shows a C++ implementation using template traits as known from the STL. Nevertheless the DCI architecture asks for tool support.

The modified DCI Architecture
My goal is to realize the DCI architecture using actifsource. Therefore an appropriate meta-model is needed.

I decided to keep the DCI idea but modify the model a little bit.

My idea: a UseCase needs 1..N actors (Role) to be implemented. Every role is realized on a POD (plain old data) object.

A context is needed to setup a UseCase and all Data objects which can play the roles needed.

Since a role can be played by any suitable data it seems quite natural that we have to provide kind of an adapter for every Role/Data relationship.

Because I'd like to realize DCI with the actifsource code generator setting up the adapters is no big deal. Instead of hand coding adapters are generated as needed.