Copyright Avancier Limited. All rights reserved.
“The
beginning of wisdom for a programmer is to understand the difference between
getting his program to work and getting it right”. Michael Jackson
There are several approaches to proving that software
is correct. We don’t recommend “formal methods” outside of very rare
circumstances. We do recommend peer reviews (as in pair programming or ego-less
programming), but don’t consider them effective at proving correctness. Two
approaches we think worth promoting are
The traditional waterfall sequence is design, build
test. This paper is about test-driven design, which puts testing at the
forefront. You will see I wrote most of this paper after listening to a talk by
Bob Martin, and talking to him afterwards. And that this discussion leads to
this summary test-driven design guidance
1)
Code the
test first.
2)
Write
executable specifications.
3)
Code
iteratively.
4)
Start
simple.
5)
Retain
tests for regression testing.
6)
Refactor
continuously.
7)
Suppress
detail rather than comment code.
8)
Choose the
appropriate model
9)
Test every
statement
10)
Don’t forget the boundary
conditions
What proportion of a software development project is
spent in testing?
·
Estimating
guidance for a waterfall project typically allows 30% for the testing phase.
·
Testing
specialists say the testing phase commonly takes 60% of a project; this is
probably true for an iterative project.
·
And
following the use of XP, you might say the testing phase is closer to 90% of
the project.
You might say these figures are misleading, since a
testing phase (even in a waterfall project) involves some revisions to
requirements and continuation of development. Nevertheless, it is clear that
testing takes a lot of time and money. Yet testers and testing methods get
relatively little attention.
Recently, an experienced test manager complained to me
that time and money is wasted during system and acceptance testing on coding
errors that should have been picked up by unit and link testing. His study showed that 46% of the errors
discovered in the final testing stage were regarded as “code defects”, and 88%
of those code defects were introduced during coding (were not a result of
specification errors).
Other test managers have reported similar conclusions,
if without measuring the numbers. And other organizations report similar
difficulties. The universality of the problem makes it worthy of attention.
I have discussed issues surrounding unit testing with development specialists, testing specialists, technical risk managers and others. Let me merge snatches of conversations over several years into one mythical dialogue that highlights some of the issues.
Other: “A goal of unit testing is to test that the system works properly”.
Graham: “Surely not. Surely the first goal of unit testing is to check every statement in the program does what the developer expects, and the program does not fall over.”
Other: “All path testing is impractical; we cannot test all routes through a program”.
Graham: “I never say ‘all path testing’ because people don’t agree what ‘all path’ means. Yes, it is impractical to test all permutations of paths. But it is relatively easy to test that each statement in one unit is executed at least once by unit test data. In fact I would say this is a primary responsibility of a developer. Don’t you agree?”
Other: “I’d be inclined to say yes, but the trouble nowadays is the difficulty of unit testing all the exception and error cases.”
Graham: “Can’t you mock up server code units to generate exception conditions? But let me move on. The second goal of unit testing is to check a program does what the program specification says, not to check the program does what functional specification or users say.”
Other: “That may be OK in large design and build projects, but in application maintenance we don’t have time or resources to repeat a cycle of unit and link testing, system testing and acceptance testing.”
Graham: “Is that true across all of our application management practice? Perhaps we ought to survey what our application maintenance testing practices are?”
Other: “It isn’t even OK in large design and build projects, because we don’t write program specifications any longer. Developers receive only functional specifications, some design notes (perhaps including UML diagrams, perhaps not) and coding standards.
Graham: “That’s food for thought. It brings me back to suggesting that unit testing should at least test every statement does what the developer intended. Usefully, this means the unit test data will automatically serve as link test data after dummy or stub server components have been replaced by the real ones.”
Other: We are no longer that disciplined about distinguishing different types of testing. The distinction drawn in our best practice guidance between link testing (linking units developed by one programming team) and integration testing (linking components developed by distinct parties) is fuzzy, and many developers are not aware of it anyway.”
I asked more than 70 members of a design and
development community whether they tend to blur or distinguish unit and system
testing.
It turned out that 9% tend to blur unit testing with
system testing.
"...because I am on support… where new
developments are mostly modifications to existing code. When I have worked on
"from scratch" projects, the difference is much more
distinguished."
"for
data-driven development" [development that builds a data structure, be it
a GUI screen, database table or data warehouse].
So, 91% tend to distinguish unit testing from system
testing:
"I was of the blurring tendency until
we achieved CMMI level 3 compliance. Now I'm most
definitely in favor of a distinguishing tendency. Unit tests, following the
unit test guidelines for condition-response lists, written before any code,
produce the best quality code."
"As a developer, I quite often find I
write unit tests in parallel to my coding, to test the code as I am writing it.
This is surely how it should be!"
The minority view should not be dismissed, but the
majority view is where best practice guidance must focus. For most software
development, unit testing is a distinct and important activity. And bear in
mind that erroneous statements not exercised during unit and link testing can cause
crashes in system testing (if we are lucky).
There is a principle "testing is risk management", meaning you
spend as much on testing as the cost of failure * likelihood of failure. This
sounds as though it helps, but it doesn’t. The trouble with trying to apply
risk management principles to unit testing is that
·
every unit can fail in innumerable ways, too many
to assess
·
people’s assessments of failure cost and likelihood
vary widely
·
people don’t know what to do where a risk likelihood
is low but the risk cost is high
So, people need more
prescriptive advice.
Does the above experience speak to you? Do you expect to test every statement during unit testing? Do you expect unit test data to be sufficiently exhaustive that it serves as link test data too, when real server modules replace dummy ones?
Given all these unit testing issues have long been recognized and discussed, you might well ask: What have we done?
So far, we have done three things. First, we revised our graduate-entry training programme to ensure that developers are taught unit testing during their programming language training. They are taught system testing in a distinct follow up course. One or two in the first batch of graduates to go through this training reported their first project experience was remote from their training experience.
Second, we reviewed unit testing best practice guidance in our quality management systems. I believe the current guide reflects BCS (British Computer Society) standards. However, developers have complained the guide is old-fashioned. “Nobody has the time to do all that stuff and document it on paper nowadays. We need to be using tools like JUnit, and modern XP (extreme programming) practices”. So, we shifted responsibility for the unit testing guide from our Testing Forum members to our Des/Dev Forum members, and our guide is currently under review by Des/Dev Forum leaders.
Third, we ran a prototype designer/developer
workshop in
In
short, we have done something, but not enough. I am moved now to pick up the
torch for more disciplined unit testing under the banner of XP (extreme
programming).
Discussion has suggested
that:
·
Developers have different views about the goals of unit
and link testing
·
Some regard unit testing as 1st pass
system testing
·
Some in application support skip unit testing,
repeating system tests instead
·
Some aren't trained in unit and link testing
practices
·
Some don't apply published standards for unit and
link testing
·
Some regard traditional unit testing standards as
out of date
·
Not many use tools like JUnit and practices like
test-driven development.
What to advise? Agile development methods provide some possible answers. Perhaps the most well-known Agile programming approach is XP (extreme programming). This may sound like hacking, but it is far from it; it is a highly disciplined approach to development. And a fundamental principle of XP is test-driven development.
At the DSDM/Agile conference in October 2004, Bob Martin spoke on test-driven development. In his first session, he spoke on unit-test driven development. He used the device of a demonstration in which he coded an algorithm in Java, using JUnit as a testing tool and IntelliJ as a development environment.
Bob knows his subject, expressed his views with clarity and strength, and was delightfully politically incorrect. A generation has passed away since a speaker last enthused me about program design. I will describe his session and conclusions that may be drawn from it.
Bob swiftly moved to introduce the development problem to be tackled in his demo. His mission: to code a Java program that will score a ten-pin bowling game. Some of us in the audience weren’t too sure of the rules of ten-pin bowling. So here they are.
The business rules: A
bowler bowls ten frames in a game. Each frame starts with ten standing pins. In
each frame, the bowler is permitted to roll two balls at the pins. The bowler
scores one point for each pin knocked down by a ball. There are two ways to
earn additional points. If the first ball of a frame knocks down all ten pins,
then that is called a “strike” (the second ball of this frame is not thrown)
and the score of the next two balls is added to the score of this frame. If the
second ball knocks down all standing pins, that is a “spare”, and the score of
the next ball is added to the score of this frame. If the bowler bowls a spare
or a strike in the 10th frame, then he/she is allowed to roll one or two extra
balls to score the additional points. So the bowler bowls at most 22 balls.
Before you read any further, perhaps you would like to have a go at coding this program, whether in Java or another programming language?
Bob started by drawing a UML class diagram.
Before you read any further, perhaps you would like to have a go at drawing whatever UML diagram(s) you think will help?
You’ll recall Bob Martin’s mission: to code a Java program that will score a ten-pin bowling game. And in doing this, to demonstrate test driven development.
Bob started by drawing a UML class diagram. He asked us to name some classes. Pretty soon a flip chart was covered with classes and relationships such as:
· Bowler
· Game
· Frame – with 10th Frame as a
subtype
· Ball – with Strike and Spare as subtypes
Bob
complained about the poor graphics in UML. E.g. Vital semantic differences rely
on arrowheads being open or closed. Bob complained about a shoddy UML concept -
aggregation - nobody knows what placing the aggregation symbol on an
association relationship is supposed to mean. (Actually, I said this in the
early 1990s and Martin Fowler later made the point in his “UML distilled”
book.).
I was beginning to warm to Bob. And I warmed
to him more as his demonstration proceeded. Bob turned from his
UML class diagram on the flipchart to his laptop and started coding. You should
know at this point that I can’t read Java, and cannot describe exactly what Bob
did with the code. I think I understood all the key points of principle. I will
have to change and simplify the story a great deal, but the principles hold.
The gutter game: First, Bob coded a class to test the program, or at least a very simple primitive version
of the program. He coded an operation to roll 20 bowling balls, supplying a
parameter for the number of pins knocked over by each ball, and invoke the
yet-to-be-coded solution program.
Bob created his first test run as a gutter
game (all zeroes), asserted what the resulting game score should be (zero). Bob then coded a solution class called Game to score the bowling game, but he cheated by
returning what he knew to be the answer – zero. Bob ran the test.
JUnit returned a green report – meaning no errors found.
The all ones game: Bob created a second test run with all ones. He tested the Game class. JUnit returned a red report – showing actual and expected results
differ. He added a loop to add up all the pins knocked over. He retested. JUnit
returned a green report.
The all twos game: Bob created a test run with all twos. He tested the Game class. It worked fine. However, Bob worried that the code had the same
statement in two places. He refactored the code to place the statement once only. He retested.
Notice Bob’s enthusiasm to retest after every
tiny program amendment.
The all ones and one spare game: Bob created a third test run with one ‘spare’
ball. He tested the Game class. It did not work. He fixed the error by
adding a condition for the second ball in a frame that (in effect) counted the
next ball twice. He retested.
The code for the spare looked
incomprehensible, so Bob added a comment (// spare).
The all ones and one strike game: Bob created a fourth test run with one
‘strike’ ball. He tested the Game class. It did not work. He fixed the error by
adding a condition for the first ball in a frame that (in effect) counted the
next two balls twice. He retested.
The code for the strike looked a bit complex, so Bob considered adding a comment (// strike). At this point Bob made a remarkable observation.
“I hate comments. All experienced developers hate comments. If we
open some code and see lots of comments we feel depressed. We think oh no, I’ll
have to read all those comments, and worse, I won’t be able to rely on them.
They will likely mislead me because they have not been updated in line with the
code.”
So, Bob refactored the code again. He created two “functions” called Spare and Strike. Bob replaced
them in the main code by simple and obviously named function calls. Thus, Bob
simplified the main program by suppressing detail into subroutines, within the
same Java class. He retested.
Bob carried on iterating around a
test-develop cycle until he had us all convinced that a) his program worked
correctly b) his program was of a high quality, well-written and economical and
c) he had built up a bank of test cases he could rerun at a moment's notice to
exercise any amendment he made, or just for fun!
Bob’s demonstration illustrated several Agile
coding principles. Afterwards, I approached him to ask some questions about
more general implications.
Graham: I noticed you completely ignored the class diagram on the flip chart. Were you meaning to suggest test-driven beats model-driven? UML is a poor vehicle for designing a processes? Class diagrams encourage developers to design in way that excessively distributes responsibilities between classes? Leads to a design with excessive message passing and/or pointless inheritance?
Bob: Yes. Also, I find it harder to write UML (from which I can generate Java) than to write Java (from which I can generate UML). I'd rather reverse engineer than forward engineer.
Graham: I suspect a better model for your example would be a regular expression.
Graham: Do you code a test for every statement in
your solution code, including code that detects exceptions such as servers
going missing?
Bob: Yes, always. I mock up any client and
server components that do not yet exist, and get them to mimic exceptions. I
will want to rerun those tests over and over again as the system grows.
Graham: I see that if you code to test every
statement during a unit test, then your unit test data will suffice for link
testing as well. All you have to do is replace the fake clients and servers
with the real ones.
Thinking afterwards, I suspected the solution that Bob demonstrated won’t stand up to test data that tests the boundary conditions. What happens if his test code keeps rolling balls after 22? Will he have to extend to the solution code to process the end of the game as a special case, and ignore all subsequent balls?
But Bob’s methodology for doing this is clear. First add test data to test the boundary conditions, then retest the solution code, and then, if it doesn’t work, change the code and retest.
What do you think of Bob's test-driven development approach? Do you do it? Do you think other developers should work this way? Do you like the sound of it? Would you like to do it? What stops you?
Ten principles might be distilled from Bob’s demonstration and subsequent discussion. A ten point practice might be proposed.
Principle |
Bob’s
discussion |
1) Code the test first. |
Bob never writes a line of solution code
without first writing test code to invoke the solution code with test data,
and coding an assertion declaring the expected results. |
2) Write executable specifications. |
Bob doesn’t want or write a detailed program
specification, because he regards the test code and test data as being the
most effective and precise form of specification. He calls this “executable
specification”. |
3) Code iteratively. |
Bob builds up the test code and the solution
code in very small alternating steps. So he does not spend a long time working on
test code alone, or solution code alone. And his solution code is never
seriously wrong for the next iteration. |
4) Start simple. |
Bob starts with the simplest test data and simplest
solution code, and elaborates to deal with increasingly complex cases. So his solution tends to reach point where the
80/20 principle kicks in as early as possible. |
5) Retain tests for regression testing. |
Bob’s practice automatically builds a bank of
test cases that can be rerun at any point. And he does rerun these test cases as
regression tests after every change. |
6) Refactor continuously. |
Bob refactors the code as he goes along, to minimize
the amount of code written. Bob always retests after refactoring, just has he
would after any other change. The only difference is that he does not need to
add any new test cases, because refactoring (by definition) does not change
the function of the program. |
7) Suppress detail rather than comment code. |
Bob simplifies the main program by suppressing
detail into subroutines (functions within a Java class). He does in this in place of writing comments, which
he regards as being a proof of incompetence. |
8) Choose the appropriate model |
Test cases are immensely valuable in
specifications. Draw models only where you need them, and
choose the style of model that best suits the problem. |
9) Test every statement |
If you code to test every statement during a
unit test, then your unit test data will suffice for link testing as well. |
10) Don’t forget the boundary conditions |
Wherever there is range of values or an iteration,
test the conditions that exercise a null value, an empty iteration, and the
various ways the range or iteration may overflow or finish. |
The principles above are inspired by XP. For a full account of XP, read…
But we are not finished yet with the principles and potential of test-driven development.
Bob Martin’s presentation on test-driven
development illustrated many points about what he claims to be best programming
practice. During his presentation, I had something of a
Graham: We recommend coding a server component defensively to detect every possible contravention of its preconditions. I guess you would code a test for every such precondition failure?
Bob: Yes.
So in our example, Bob would invoke the circumference calculator with a radius that is not a number. And code as follows:
Cycle Tread Reporter
(client) |
Set Radius = Front Wheel Radius |
Give me
circumference (call to server) |
Return w error message
if server reports radius is not a number |
Set Front Wheel Circumference = Circumference |
Graham: Suppose you code a client component to hold a variable declaring the expected time a server component will take before returning to the client (or you send this as a parameter to the server component). You code the client to time from invoking the server to getting a reply (or code the server to time itself and return a success/failure reply). I guess you can assert that any actual time in excess of the expected time is to be treated as a unit test failure?
Bob: Yes. And I can mock up client and server components during unit testing.
So in our example, Bob might code as follows.
Cycle Tread Reporter
(client) |
Set Radius = Front Wheel Radius |
Start timer |
Give me
circumference (call to server) |
Return w error message if server reports radius is not a number |
Return w error message
if time exceeds maximum response time |
Set Front Wheel Circumference = Circumference |
Of course, there are complications. Response time has to be defined properly. Is it a minimum response time? Is it a maximum response time? Is it an average response time? Does the statistical distribution matter? I don't think these questions have no possible answers, but they make the issue quite delicate. And they have no direct counterpart with functional specifications.
Graham: Suppose you code a client component defensively with else options that report the unexpected absence of any server component. I guess you can assert that such any exception found report is to be treated as a unit test failure?
Bob: Yes.
So in our example, Bob might code as follows.
Cycle Tread Reporter
(client) |
Set Radius = Front Wheel Radius |
Start timer |
Give me circumference (call to server) |
Return w error message if server reports radius is not a number |
Return w error message if time exceeds maximum response time |
Return w error message
if server replies with anything but circumference |
Set Front Wheel Circumference = Circumference |
One should use the best modeling tool for the purpose at hand. A regular expression is often the best tool for modeling the structure of a problem, a data flow or a program.
A regular expression (aka
A serial file as a regular expression |
Serial file sequence |
File header record |
File body iterate while not trailer record |
Record select if type A |
Record type A |
Record
select if
type B |
Record type A |
Record
else |
Invalid record |
File
body end |
File trailer record |
Serial
file end |
Kleene’s theorem says
that every data flow recognizable by a computer can be described as a regular
expression.
Bohm and Jacopini’s principle says that every program can be described as a regular expression. (Strictly speaking, and rather disappointingly, this principle says that every program can built around the same trivial and often unhelpful structure – an iterated selection – a loop with a case statement inside it.)
Jackson (ref 4) proposed that in general, the structure of a program should be based on the structure of the problem it solves. More specifically, where the problem is to produce one or more output data flow structures from one or more input data structures, then structure of the program should reflect the structure of the I/O data structures.
Remember the conversation I had with Bob Martin? There was one more exchange.
Graham: I noticed you completely ignored the UML diagram on the flip chart. Were you meaning to suggest UML is a poor vehicle for designing a processes? That it encourages developers to design in way that excessively distributes responsibilities between classes? Leads to a design with excessive message passing? Leads to a design with pointless inheritance?
Bob: Yes.
Graham: While you were talking I modelled the ten-pin bowling game in my head as a regular expression. The Game is an iteration of Frames. A Frame is a sequence of two Balls. The first Ball is a selection of Strike or not. The second Ball is a selection of Spare or not.
Ten pin bowling game as
a regular expression |
Game iteration while balls to bowl |
Frame sequence |
First Ball select if Strike |
First Ball else |
First Ball end |
Second Ball select if Spare |
Second Ball else |
Second Ball end |
Frame end |
Game end |
I may not have completed this structure, since I am a little unclear about what happens at the end of the game. However, the regular expression seems to me a better model of the problem than UML can provide in the form of a class or interaction diagram.
I am fighting my instinct to complete a model before coding. The rough model above looks good enough to start coding and testing.