Friday 27 April 2007

Why is code generation so powerful in bespoke database development projects?

Good practice in software development very often involves finding a "sweet spot" between advantages and disadvantages of moving a development architecture/methodology in the direction of or away from opposing approaches. Perhaps some examples would clarify what I mean.

A classic example is in the normalisation of data. Normalisation for non-programmers is a process which involves the elimination of unnecessary repetition in data storage. Instead of storing customer information with every order, we may instead have a separate table of customer information, and "tag" each order with an customer information ID to indicate which customer the order was placed by. This has a couple of immediate advantages. One is that it avoids the unnecessary duplication of customer data which can now be kept in a single record for each customer. Further it means that when a customer record is updated, only the one record needs to be changed.

However as database programmers consider further, we realise that although normalisation is a pretty useful thing, it is not always such a good idea. Sometimes in fact it is better to keep data stored in forms that are not at all normalised. This may help to make data entry quicker, or improve the responsiveness of a particularly critical part of an application. Differing requirements and purposes of differing business situations, may direct a software architect more towards normalisation or to move away from it, in any particular part of a data structure, and so the insight as to where to position the "sweet spot" between the extremes of normalisation and denormalisation comes to be something of an art, which may require some years to master really well.

Another similar example is in regard to the extent to which program code is parameterised. Starting out as a "young" programmer, it is easy to think that all parameterisation is good. If it is possible to take a parameter out of a pair of functions so that you have instead a single function which does the job of both with half the original amount of code, simply by passing in an extra parameter, surely that is a good thing. However in software development it turns out that this is not always the case. There are a number of reasons. One is that modifications to requirements may subsequently make the two original functions not fit together so conveniently and so require them to be separated again. A second is that when a function is dealing with multiple disparate calls, it is harder to identify the reasons why it might be raising exceptions. Data being passed in to the function may be coming from multiple different places, and if any of those sources contain inconsistent data, it is harder to identify which source caused the problem. Furthermore when you go to fix a problem, you have to be careful not to adjust the code in a way that makes it fail with situations that were previously working, other than the one you are dealing with.

All of these reasons drive developers away from functions, classes and libraries that are trying to accomplish too much, ones which are trying to be too generally applicable to too many different sets of data. In other words these reasons drive developers away from what is sometimes called "code reuse" and in the direction of "code generation". Being able to "reuse" code requires the code to be general enough that it can be used for multiple related situations. "Code generation" in contrast is a technique that involves the automatic substitution of what would otherwise be parameters to a function, into the literal coding of the function, so that the function becomes "de-parameterised". The code generator may make 20 similar functions, one for each value of substituted parameter, and in turn it will name those functions or those function objects acording to the value of parameter substituted.

Although this may seem counter-intuitive, and lead to large amounts of repeated code - and hence "code-blote", it actually turns out to make life much easier for software developers, and hence increasing their productivity, and in turn leading to happier customers. An example of how this makes life easier for programmers is when it comes to dealing with the differences which may arise in the different situations that the orginal function of object was being applied. Now when making adjustments to the one of the generated functions to accound for such a specialised situation, there is no danger of breaking the code that deals with any other value of the removed parameter.

Further discussion of the value of code geration can be found here: Code generation and best practice in software development