Agile test-driven design (What’s wrong with the waterfall sequence?)

“The beginning of wisdom for a programmer is to understand the difference between getting his program to work and getting it right”. Michael Jackson

There are several approaches to proving that software is correct. We don’t recommend “formal methods” outside of very rare circumstances. We do recommend peer reviews (as in pair programming or ego-less programming), but don’t consider them effective at proving correctness. Two approaches we think worth promoting are

An agile approach – test-driven design.
A non-agile approach – model-driven design.

The traditional waterfall sequence is design, build test. This paper is about test-driven design, which puts testing at the forefront. You will see I wrote most of this paper after listening to a talk by Bob Martin, and talking to him afterwards. And that this discussion leads to this summary test-driven design guidance

1) Code the test first.

2) Write executable specifications.

3) Code iteratively.

4) Start simple.

5) Retain tests for regression testing.

6) Refactor continuously.

7) Suppress detail rather than comment code.

8) Choose the appropriate model

9) Test every statement

10) Don’t forget the boundary conditions

Respect testing

What proportion of a software development project is spent in testing?

· Estimating guidance for a waterfall project typically allows 30% for the testing phase.

· Testing specialists say the testing phase commonly takes 60% of a project; this is probably true for an iterative project.

· And following the use of XP, you might say the testing phase is closer to 90% of the project.

You might say these figures are misleading, since a testing phase (even in a waterfall project) involves some revisions to requirements and continuation of development. Nevertheless, it is clear that testing takes a lot of time and money. Yet testers and testing methods get relatively little attention.

Recently, an experienced test manager complained to me that time and money is wasted during system and acceptance testing on coding errors that should have been picked up by unit and link testing. His study showed that 46% of the errors discovered in the final testing stage were regarded as “code defects”, and 88% of those code defects were introduced during coding (were not a result of specification errors).

Other test managers have reported similar conclusions, if without measuring the numbers. And other organizations report similar difficulties. The universality of the problem makes it worthy of attention.

Developers have plausible excuses

I have discussed issues surrounding unit testing with development specialists, testing specialists, technical risk managers and others. Let me merge snatches of conversations over several years into one mythical dialogue that highlights some of the issues.

Other: “A goal of unit testing is to test that the system works properly”.

Graham: “Surely not. Surely the first goal of unit testing is to check every statement in the program does what the developer expects, and the program does not fall over.”

Other: “All path testing is impractical; we cannot test all routes through a program”.

Graham: “I never say ‘all path testing’ because people don’t agree what ‘all path’ means. Yes, it is impractical to test all permutations of paths. But it is relatively easy to test that each statement in one unit is executed at least once by unit test data. In fact I would say this is a primary responsibility of a developer. Don’t you agree?”

Other: “I’d be inclined to say yes, but the trouble nowadays is the difficulty of unit testing all the exception and error cases.”

Graham: “Can’t you mock up server code units to generate exception conditions? But let me move on. The second goal of unit testing is to check a program does what the program specification says, not to check the program does what functional specification or users say.”

Other: “That may be OK in large design and build projects, but in application maintenance we don’t have time or resources to repeat a cycle of unit and link testing, system testing and acceptance testing.”

Graham: “Is that true across all of our application management practice? Perhaps we ought to survey what our application maintenance testing practices are?”

Other: “It isn’t even OK in large design and build projects, because we don’t write program specifications any longer. Developers receive only functional specifications, some design notes (perhaps including UML diagrams, perhaps not) and coding standards.

Graham: “That’s food for thought. It brings me back to suggesting that unit testing should at least test every statement does what the developer intended. Usefully, this means the unit test data will automatically serve as link test data after dummy or stub server components have been replaced by the real ones.”

Other: We are no longer that disciplined about distinguishing different types of testing. The distinction drawn in our best practice guidance between link testing (linking units developed by one programming team) and integration testing (linking components developed by distinct parties) is fuzzy, and many developers are not aware of it anyway.”

But most developers consider unit testing important

I asked more than 70 members of a design and development community whether they tend to blur or distinguish unit and system testing.

It turned out that 9% tend to blur unit testing with system testing.

"...because I am on support… where new developments are mostly modifications to existing code. When I have worked on "from scratch" projects, the difference is much more distinguished."

"for data-driven development" [development that builds a data structure, be it a GUI screen, database table or data warehouse].

So, 91% tend to distinguish unit testing from system testing:

"I was of the blurring tendency until we achieved CMMI level 3 compliance. Now I'm most definitely in favor of a distinguishing tendency. Unit tests, following the unit test guidelines for condition-response lists, written before any code, produce the best quality code."

"As a developer, I quite often find I write unit tests in parallel to my coding, to test the code as I am writing it. This is surely how it should be!"

The minority view should not be dismissed, but the majority view is where best practice guidance must focus. For most software development, unit testing is a distinct and important activity. And bear in mind that erroneous statements not exercised during unit and link testing can cause crashes in system testing (if we are lucky).

Testing as risk management

There is a principle "testing is risk management", meaning you spend as much on testing as the cost of failure * likelihood of failure. This sounds as though it helps, but it doesn’t. The trouble with trying to apply risk management principles to unit testing is that

· every unit can fail in innumerable ways, too many to assess

· people’s assessments of failure cost and likelihood vary widely

· people don’t know what to do where a risk likelihood is low but the risk cost is high

So, people need more prescriptive advice.

Conclusions

Does the above experience speak to you? Do you expect to test every statement during unit testing? Do you expect unit test data to be sufficiently exhaustive that it serves as link test data too, when real server modules replace dummy ones?

Given all these unit testing issues have long been recognized and discussed, you might well ask: What have we done?

So far, we have done three things. First, we revised our graduate-entry training programme to ensure that developers are taught unit testing during their programming language training. They are taught system testing in a distinct follow up course. One or two in the first batch of graduates to go through this training reported their first project experience was remote from their training experience.

Second, we reviewed unit testing best practice guidance in our quality management systems. I believe the current guide reflects BCS (British Computer Society) standards. However, developers have complained the guide is old-fashioned. “Nobody has the time to do all that stuff and document it on paper nowadays. We need to be using tools like JUnit, and modern XP (extreme programming) practices”. So, we shifted responsibility for the unit testing guide from our Testing Forum members to our Des/Dev Forum members, and our guide is currently under review by Des/Dev Forum leaders.

Third, we ran a prototype designer/developer workshop in Scotland. We devoted a substantial session to the use of a testing tool like JUnit. The workshop was a bit rough round the edges, but OK. Sadly, we never had time or resources to run it again before the key contributor moved on. However, he left us materials that could be reused.

In short, we have done something, but not enough. I am moved now to pick up the torch for more disciplined unit testing under the banner of XP (extreme programming).

Test-driven development

Discussion has suggested that:

· Developers have different views about the goals of unit and link testing

· Some regard unit testing as 1^st pass system testing

· Some in application support skip unit testing, repeating system tests instead

· Some aren't trained in unit and link testing practices

· Some don't apply published standards for unit and link testing

· Some regard traditional unit testing standards as out of date

· Not many use tools like JUnit and practices like test-driven development.

What to advise? Agile development methods provide some possible answers. Perhaps the most well-known Agile programming approach is XP (extreme programming). This may sound like hacking, but it is far from it; it is a highly disciplined approach to development. And a fundamental principle of XP is test-driven development.

At the DSDM/Agile conference in October 2004, Bob Martin spoke on test-driven development. In his first session, he spoke on unit-test driven development. He used the device of a demonstration in which he coded an algorithm in Java, using JUnit as a testing tool and IntelliJ as a development environment.

Bob knows his subject, expressed his views with clarity and strength, and was delightfully politically incorrect. A generation has passed away since a speaker last enthused me about program design. I will describe his session and conclusions that may be drawn from it.

Bob swiftly moved to introduce the development problem to be tackled in his demo. His mission: to code a Java program that will score a ten-pin bowling game. Some of us in the audience weren’t too sure of the rules of ten-pin bowling. So here they are.

Ten-pin bowling

The business rules: A bowler bowls ten frames in a game. Each frame starts with ten standing pins. In each frame, the bowler is permitted to roll two balls at the pins. The bowler scores one point for each pin knocked down by a ball. There are two ways to earn additional points. If the first ball of a frame knocks down all ten pins, then that is called a “strike” (the second ball of this frame is not thrown) and the score of the next two balls is added to the score of this frame. If the second ball knocks down all standing pins, that is a “spare”, and the score of the next ball is added to the score of this frame. If the bowler bowls a spare or a strike in the 10th frame, then he/she is allowed to roll one or two extra balls to score the additional points. So the bowler bowls at most 22 balls.

Before you read any further, perhaps you would like to have a go at coding this program, whether in Java or another programming language?

Bob started by drawing a UML class diagram.

Before you read any further, perhaps you would like to have a go at drawing whatever UML diagram(s) you think will help?

UML design

You’ll recall Bob Martin’s mission: to code a Java program that will score a ten-pin bowling game. And in doing this, to demonstrate test driven development.

Bob started by drawing a UML class diagram. He asked us to name some classes. Pretty soon a flip chart was covered with classes and relationships such as:

· Bowler

· Game

· Frame – with 10^th Frame as a subtype

· Ball – with Strike and Spare as subtypes

Bob complained about the poor graphics in UML. E.g. Vital semantic differences rely on arrowheads being open or closed. Bob complained about a shoddy UML concept - aggregation - nobody knows what placing the aggregation symbol on an association relationship is supposed to mean. (Actually, I said this in the early 1990s and Martin Fowler later made the point in his “UML distilled” book.).

Test-driven coding

I was beginning to warm to Bob. And I warmed to him more as his demonstration proceeded. Bob turned from his UML class diagram on the flipchart to his laptop and started coding. You should know at this point that I can’t read Java, and cannot describe exactly what Bob did with the code. I think I understood all the key points of principle. I will have to change and simplify the story a great deal, but the principles hold.

The gutter game: First, Bob coded a class to test the program, or at least a very simple primitive version of the program. He coded an operation to roll 20 bowling balls, supplying a parameter for the number of pins knocked over by each ball, and invoke the yet-to-be-coded solution program.

Bob created his first test run as a gutter game (all zeroes), asserted what the resulting game score should be (zero). Bob then coded a solution class called Game to score the bowling game, but he cheated by returning what he knew to be the answer – zero. Bob ran the test. JUnit returned a green report – meaning no errors found.

The all ones game: Bob created a second test run with all ones. He tested the Game class. JUnit returned a red report – showing actual and expected results differ. He added a loop to add up all the pins knocked over. He retested. JUnit returned a green report.

The all twos game: Bob created a test run with all twos. He tested the Game class. It worked fine. However, Bob worried that the code had the same statement in two places. He refactored the code to place the statement once only. He retested.

Notice Bob’s enthusiasm to retest after every tiny program amendment.

The all ones and one spare game: Bob created a third test run with one ‘spare’ ball. He tested the Game class. It did not work. He fixed the error by adding a condition for the second ball in a frame that (in effect) counted the next ball twice. He retested.

The code for the spare looked incomprehensible, so Bob added a comment (// spare).

The all ones and one strike game: Bob created a fourth test run with one ‘strike’ ball. He tested the Game class. It did not work. He fixed the error by adding a condition for the first ball in a frame that (in effect) counted the next two balls twice. He retested.

The code for the strike looked a bit complex, so Bob considered adding a comment (// strike). At this point Bob made a remarkable observation.

“I hate comments. All experienced developers hate comments. If we open some code and see lots of comments we feel depressed. We think oh no, I’ll have to read all those comments, and worse, I won’t be able to rely on them. They will likely mislead me because they have not been updated in line with the code.”

So, Bob refactored the code again. He created two “functions” called Spare and Strike. Bob replaced them in the main code by simple and obviously named function calls. Thus, Bob simplified the main program by suppressing detail into subroutines, within the same Java class. He retested.

Bob carried on iterating around a test-develop cycle until he had us all convinced that a) his program worked correctly b) his program was of a high quality, well-written and economical and c) he had built up a bank of test cases he could rerun at a moment's notice to exercise any amendment he made, or just for fun!

Bob’s demonstration illustrated several Agile coding principles. Afterwards, I approached him to ask some questions about more general implications.

Choose the appropriate model

Graham: I noticed you completely ignored the class diagram on the flip chart. Were you meaning to suggest test-driven beats model-driven? UML is a poor vehicle for designing a processes? Class diagrams encourage developers to design in way that excessively distributes responsibilities between classes? Leads to a design with excessive message passing and/or pointless inheritance?

Bob: Yes. Also, I find it harder to write UML (from which I can generate Java) than to write Java (from which I can generate UML). I'd rather reverse engineer than forward engineer.

Graham: I suspect a better model for your example would be a regular expression.

Test every statement

Graham: Do you code a test for every statement in your solution code, including code that detects exceptions such as servers going missing?

Bob: Yes, always. I mock up any client and server components that do not yet exist, and get them to mimic exceptions. I will want to rerun those tests over and over again as the system grows.

Graham: I see that if you code to test every statement during a unit test, then your unit test data will suffice for link testing as well. All you have to do is replace the fake clients and servers with the real ones.

Don’t forget the boundary conditions

Thinking afterwards, I suspected the solution that Bob demonstrated won’t stand up to test data that tests the boundary conditions. What happens if his test code keeps rolling balls after 22? Will he have to extend to the solution code to process the end of the game as a special case, and ignore all subsequent balls?

But Bob’s methodology for doing this is clear. First add test data to test the boundary conditions, then retest the solution code, and then, if it doesn’t work, change the code and retest.

What do you think of Bob's test-driven development approach? Do you do it? Do you think other developers should work this way? Do you like the sound of it? Would you like to do it? What stops you?

Agile coding principles

Ten principles might be distilled from Bob’s demonstration and subsequent discussion. A ten point practice might be proposed.

Principle	Bob’s discussion
1) Code the test first.	Bob never writes a line of solution code without first writing test code to invoke the solution code with test data, and coding an assertion declaring the expected results.
2) Write executable specifications.	Bob doesn’t want or write a detailed program specification, because he regards the test code and test data as being the most effective and precise form of specification. He calls this “executable specification”.
3) Code iteratively.	Bob builds up the test code and the solution code in very small alternating steps. So he does not spend a long time working on test code alone, or solution code alone. And his solution code is never seriously wrong for the next iteration.
4) Start simple.	Bob starts with the simplest test data and simplest solution code, and elaborates to deal with increasingly complex cases. So his solution tends to reach point where the 80/20 principle kicks in as early as possible.
5) Retain tests for regression testing.	Bob’s practice automatically builds a bank of test cases that can be rerun at any point. And he does rerun these test cases as regression tests after every change.
6) Refactor continuously.	Bob refactors the code as he goes along, to minimize the amount of code written. Bob always retests after refactoring, just has he would after any other change. The only difference is that he does not need to add any new test cases, because refactoring (by definition) does not change the function of the program.
7) Suppress detail rather than comment code.	Bob simplifies the main program by suppressing detail into subroutines (functions within a Java class). He does in this in place of writing comments, which he regards as being a proof of incompetence.
8) Choose the appropriate model	Test cases are immensely valuable in specifications. Draw models only where you need them, and choose the style of model that best suits the problem.
9) Test every statement	If you code to test every statement during a unit test, then your unit test data will suffice for link testing as well.
10) Don’t forget the boundary conditions	Wherever there is range of values or an iteration, test the conditions that exercise a null value, an empty iteration, and the various ways the range or iteration may overflow or finish.

More about XP

The principles above are inspired by XP. For a full account of XP, read…

But we are not finished yet with the principles and potential of test-driven development.

Test to exercise a defensive design

Bob Martin’s presentation on test-driven development illustrated many points about what he claims to be best programming practice. During his presentation, I had something of a Eureka moment in which the implications of defensive design for unit testing became apparent.

Code and unit test to reveal business rule failures

Graham: We recommend coding a server component defensively to detect every possible contravention of its preconditions. I guess you would code a test for every such precondition failure?

Bob: Yes.

So in our example, Bob would invoke the circumference calculator with a radius that is not a number. And code as follows:

Cycle Tread Reporter (client)

Set Radius = Front Wheel Radius

Give me circumference (call to server)

Return w error message if server reports radius is not a number

Set Front Wheel Circumference = Circumference

Code and unit test to reveal response time failures

Graham: Suppose you code a client component to hold a variable declaring the expected time a server component will take before returning to the client (or you send this as a parameter to the server component). You code the client to time from invoking the server to getting a reply (or code the server to time itself and return a success/failure reply). I guess you can assert that any actual time in excess of the expected time is to be treated as a unit test failure?

Bob: Yes. And I can mock up client and server components during unit testing.

So in our example, Bob might code as follows.

Cycle Tread Reporter (client)

Set Radius = Front Wheel Radius

Start timer

Give me circumference (call to server)

Return w error message if server reports radius is not a number

Return w error message if time exceeds maximum response time

Set Front Wheel Circumference = Circumference

Of course, there are complications. Response time has to be defined properly. Is it a minimum response time? Is it a maximum response time? Is it an average response time? Does the statistical distribution matter? I don't think these questions have no possible answers, but they make the issue quite delicate. And they have no direct counterpart with functional specifications.

Code and unit test to reveal availability failures

Graham: Suppose you code a client component defensively with else options that report the unexpected absence of any server component. I guess you can assert that such any exception found report is to be treated as a unit test failure?

Bob: Yes.

So in our example, Bob might code as follows.

Cycle Tread Reporter (client)

Set Radius = Front Wheel Radius

Start timer

Give me circumference (call to server)

Return w error message if server reports radius is not a number

Return w error message if time exceeds maximum response time

Return w error message if server replies with anything but circumference

Set Front Wheel Circumference = Circumference

Conclusions

Foonote: Remember regular expressions make useful models

One should use the best modeling tool for the purpose at hand. A regular expression is often the best tool for modeling the structure of a problem, a data flow or a program.

A little theory

A regular expression (aka Jackson structure) is a hierarchical structure composed of sequence, selection and iteration components, imposed over elementary components. E.g.

A serial file as a regular expression

Serial file sequence

File header record

File body iterate while not trailer record

Record select if type A

Record type A

Record select if type B

Record type A

Record else

Invalid record

File body end

File trailer record

Serial file end

Kleene’s theorem says that every data flow recognizable by a computer can be described as a regular expression.

Bohm and Jacopini’s principle says that every program can be described as a regular expression. (Strictly speaking, and rather disappointingly, this principle says that every program can built around the same trivial and often unhelpful structure – an iterated selection – a loop with a case statement inside it.)

Jackson (ref 4) proposed that in general, the structure of a program should be based on the structure of the problem it solves. More specifically, where the problem is to produce one or more output data flow structures from one or more input data structures, then structure of the program should reflect the structure of the I/O data structures.

Modelling the ten-pin bowling game as a regular expression

Remember the conversation I had with Bob Martin? There was one more exchange.

Graham: I noticed you completely ignored the UML diagram on the flip chart. Were you meaning to suggest UML is a poor vehicle for designing a processes? That it encourages developers to design in way that excessively distributes responsibilities between classes? Leads to a design with excessive message passing? Leads to a design with pointless inheritance?

Bob: Yes.

Graham: While you were talking I modelled the ten-pin bowling game in my head as a regular expression. The Game is an iteration of Frames. A Frame is a sequence of two Balls. The first Ball is a selection of Strike or not. The second Ball is a selection of Spare or not.

Ten pin bowling game as a regular expression

Game iteration while balls to bowl

Frame sequence

First Ball select if Strike

First Ball else

First Ball end

Second Ball select if Spare

Second Ball else

Second Ball end

Frame end

Game end

I may not have completed this structure, since I am a little unclear about what happens at the end of the game. However, the regular expression seems to me a better model of the problem than UML can provide in the form of a class or interaction diagram.

I am fighting my instinct to complete a model before coding. The rough model above looks good enough to start coding and testing.