This booklet is published under
the terms of the licence summarized in footnote 1.
What is a class? What does a box in an entity model represent? Should
either subclasses or parallel aspects of a class be specified as distinct
classes? You can’t have it all ways in one entity model.
This chapter shows five kinds of entity model. A methodology or technology can accommodate clashing views of what a class is, by allowing several versions data structures to coexist, each entity model being directed at a different purpose.
Booch (1994) probably speaks for most computer scientists when he says ‘mapping an object-oriented view of the world onto a relational one is conceptually straightforward, although in practice it involves a lot of tedious details’. We cannot deny the ‘tedious details’, which means that even trivial examples of enterprise applications take up a lot of space, but this chapter offers a challenge to the ‘conceptually straightforward’.
Although the individual data items of system specification are very
important, we are more troubled in enterprise applications by how to specify
the rules and constraints governing larger objects, higher-level relations or
aggregates of data items. Aggregation becomes an issue.
Whereas inheritance implies the generalisation
of mutually exclusive subclasses into a higher-level class that may be smaller
than its subclasses, aggregation implies the summation of
parts into a higher-level class that must be larger than its parts.
There are clashing views of how to aggregate properties to form an
object class. Different ways to group data items into classes lead to different
entity models, that is different structures specified
over the top of the data items. We need to understand the different possible
views and their implications.
A 3-tier software architecture can reconcile various views by handling
each view in a separate layer. Fig. 5a gives a rough idea of what we mean. The
chapter will explain.
Fig. 5a
Aggregation is the means by which simple elementary components are added together to form larger and more complex structures. You can build one set of elementary components into an infinite number of different, clashing, higher-level structures.
For example, a school geography book presents several maps of the same
area, showing different divisions of the earth’s surface into:
• land masses bounded by oceans
• countries bounded by political administration
• territories bounded by climate or vegetation.
It would be foolish to say that any one of these is the ‘right’ view.
And there are further clashing aggregations of people: by race, religion and
native language. Fig. 5b shows part of the world where aggregations have been
disputed for thousands of years.
Fig. 5b
The ‘correct’ aggregation of components in this example has long been a
matter of dispute. Land borders have been redrawn several times. Peoples have
moved en masse between
Components in this example are members of higher-level aggregations,
Starting at the bottom level of system specification, the atomic particles are the data items or variables. Each variable has a data type that defines its valid range of values. The volume ‘Introduction to Rules and Patterns’ showed how you may declare the data type as what may be called domain class.
Generic domain classes such as Text and Integer are widely reusable. You
may define more application-specific domain classes such as Telephone Area Code
and Country Name for reuse in a local context.
Where several variables (say, Price and Telephone Area Code) share the
same domain (say Integer), the variables are subclasses that inherit the
properties of the one domain. You can arrange domain classes themselves into a
complex inheritance tree.
Early object-oriented authors such as Meyer (1988) were much concerned
with designing production software operating at the level of domains. They
sought reuse of code via inheritance between domain classes. Some wanted
object-oriented to provide a once-and-for-all inheritance tree of domain
classes, and so save them from have code from scratch each new programming
language, operating system and CASE tool.
But for enterprise application specification a simple two-level
structure of domain classes is usually enough: generic domain class (say, text)
and application-specific domain (say Country Name). In practice, Analysts spend
little time on specifying domains. They spend most of their time thinking about
the rules operating on larger relations, each an aggregates of variables about
a business entity.
Specification of classes at the level of relations is different from
specification of classes at the level of domains. Among database-oriented
authors, Chris Date (1994) makes the remarkable statement: ‘Object classes are
domains or data types. Questions about inheritance therefore apply to domains, not
to relations.’ Is this true? What is a class anyway?
To make an aggregate, you simply add components together. There are an
infinite variety of way to group data items into aggregate classes. It is hard
to make any general statement about what an aggregate class is, until we
separate the three layers of the three-tier architecture.
Layer |
Objects like |
In domain of |
UI |
menus, windows, buttons etc. |
user interface technology |
Business services |
business entities and rules |
business users |
Data services |
tables,
records, indexes, etc. |
database
technology |
This three-way separation of concerns recurs throughout our work in
information analysis. It helps us to separate different kinds of problem, and
retain this separation from analysis and specification through to coding. It
enables us to change the data storage or UI layer with minimal disruption to
the essential business components.
Aggregate classes appear in the form of persistent database tables or
record types. Database programs treat each table as a distinct object. Some
database management software expects you to specify larger aggregate classes
such as database blocks or pages.
The
size and scope of each aggregate class in the Data services
layer is physical. It is guided by considerations of efficiency and limited by
the database technology. Each internal class may roll up data from several entity
classes, or store only part of one entity class.
Aggregate classes appear in the form of transient input messages, output
messages, windows and dialogue boxes. The GUI management software will treat
the data structure in a window as a single object when, for example, it moves
it around the screen.
The
size and scope of each aggregate class in the UI layer is
physical. It is guided by considerations of usability and limited by the user
interface technology. Each UI layer class may roll up data from several entity
classes, or display only part of one entity class.
The Business services layer includes both persistent aggregate classes (objects) and transient aggregate classes (events and enquiries).
Events and enquiries are naturally limited in size
and scope by the rule that a transient event is a minimum
unit of consistent change. But what rules limit the size and scope of the
persistent object classes? In other words: How should you group the persistent
variables into aggregate classes in the entity model?
The
size and scope of each aggregate class in the business
services layer has nothing to do with technology, or physical objects such as
database tables or GUI windows, it is a matter of logic.
Three
logical views
You might define a class as an aggregate of variables around three different centres:
aggregate centred
on |
means the class is |
a key |
a third normal form relation |
a type |
a type within a class hierarchy |
a state variable |
a state machine (or ‘parallel aspect’ of a relation) |
So far, authors have tended to gloss over the possibility of structure
clashes between these different views of a class. In trivial examples, where
there is no structure clash, the different definitions lead to the same set of
classes. You will draw the same data structure whichever definition you pick.
But for non-trivial enterprise applications, we can no longer pretend
that the different definitions give the same answer. The more complex the
system, the more the logical views diverge from each other. The data pictures
you draw will depend on the definition you pick.
There are clashes between logical views of a class, and then between
logical and physical classes. You may design the physical database tables and
GUI windows to match either the relations or the state machine, but you can’t
match both. We need a richer theory of system specification, a software
architecture that results in separates components handling each logical and
physical view.
Entity model notation is not the issue here. The notation we use is only
one of several possible notations. The same questions and choices arise whatever
notation you use.
Different ways to group variables into classes lead to different entity
models, that is different structures specified over the top of the variables.
Analysts often draw entity models that are mixture of different styles without realising
it. They should understand the different possible approaches and their
implications.
At the earliest stage of system specification, you should be free to
draw any informal picture that helps you. For the sake of giving this kind of
data structure a name, we’ll call it an ‘informal entity model’. This is an
informal picture of the data in which an ‘entity’ is whatever level or size of
data group you want it to be.
Given an enterprise application consumes data and produces information,
you may uncover the classes of interest by relational data analysis of the data
in forms, reports, screens and files. The result of such analysis is a
relational entity model.
Fig. 5c
An early step in relational data analysis is to spot the business
identifiers or keys. Given an object, the value of each of its attributes is
uniquely determined by the value of its key. In what is called a ‘third normal
form’ relation, the value of each attribute is determined by first the key,
second the whole key, and third nothing but the key.
In the example above, each box is a relation, its
key is underlined, its attributes are listed, and its associations to other
relations are shown as lines connecting the boxes. The meaning of the different
styles of line doesn’t matter here.
There is little freedom of choice about what the relations are, given
that you know the users’ information requirements and you follow the idea that
object instances are uniquely identifiable from each other by a key. But this
is not the only logical view.
Another common logical view is that an object is something that
progresses through a defined series of states, from a beginning to an end. In
this state-transition view, a class is a state machine or behaviour, governed
by a state variable (SV in the illustrations below).
People who design systems with little or no persistent data, typically
embedded or process control systems, often view classes as state machines. They
discover the classes by analysing states that classes pass through and the
events that trigger state-changes.
You can use similar techniques for enterprise applications. You can
transform a relational entity model via behaviour analysis techniques into a
specification of event models that maintain the data. These event models
include the required rules in the form of preconditions. It turns out that
behaviour analysis may lead you to decompose a relation into parallel aspects.
In the event models, each parallel aspect of a relation appears as a distinct
class.
People often assume you can draw one state machine for each relation.
However, you need to draw separate state machine for the parallel aspects of a
relation.
Fig. 5d shows the entity model with one box per state machine. This
differs from the relational view in that parallel aspects of a relation have
been divided.
Fig. 5d
There is little freedom of choice about what the classes are, given that
you know the rules and constraints and follow the idea that each class is a
state machine controlled by one state variable (or none if it can be optimised
away). Note that the behaviour analysis shows that most of the classes in this
example have one-state lives, and so require no state-variable.
See the footnotes for some remarks on classes as state machines.
Given a system that maintains persistent data, you
must design the record types or tables into which the database will be divided.
The physical database designer’s view is that a class is a record type or
database table, that is the unit of input/output accessed by programs.
Database designers usually transform a relational entity model first
into a data storage structure (technology-independent) and then into a physical
database structure (technology-dependent). The data storage structure records
decisions about the physical database structure, with one box for each table or
record type.
There is much freedom of choice here. You might map logical to physical
by designing a separate table for each relation, or each subclass in a class
hierarchy, or each state machine. In complex systems you cannot do all of these
at once.
Fig. 5e avoids the structure clash between different logical views by
rolling up the logical subclasses and parallel aspects of Vehicle into one
table.
Fig. 5e
There is one more logical view, considered in the next section. The
inheritance-oriented view is that a class is something uniquely identifiable by
a type or subclass. In our example, a Vehicle may be either a Car or a Truck.
Should we show the subclasses as distinct classes in an entity model?
So by way of summary, you can distinguish four or five different ways to
define what a class is, and so draw different data structures:
Kind of entity
model |
The entity or class
is |
The informal entity model |
whatever data group you like |
The relational entity model |
a normalised relation |
The entity state machine model |
a state machine |
The data storage structure |
a database table or record |
Object-oriented purist’s entity model |
a type in a type hierarchy |
The main point of this chapter is that all five of
these views are reasonable and useful. What we need is an architecture that
enables us to consider each view during analysis, and maintain clashing views
separately in the modular construction of our software.
This and the following sections detail variations in ways to draw entity
models where a class might be divided into subclasses or into parallel aspects.
· For subclasses the options are called: aggregation, pseudo-inheritance and delegation
· For parallel aspects, the options are called: aggregation, partition and delegation.
Given a class hierarchy containing a superclass (say Vehicle) with two subclasses (say Car and Truck), people have proposed three different ways to specify the subclasses: aggregation, pseudo-inheritance and delegation.
This means specifying the entire class hierarchy as one class. You may draw an exclusion arc across relationships specific to different subclasses.
Fig. 5f
This means specifying only the subclasses as
classes, copying the properties of the superclass into each of them.
Fig. 5g
The pseudo-inheritance approach means repeating data specification
(above, the relationships to Owner and Model) in an anti-reuse,
anti-maintenance, kind of way. Nevertheless, at the expense of some
duplication, a handy rule-of-thumb is to do this if the users employ a
different range of identifiers for each subclass.
Since we have assumed that all kinds of Vehicle are identified by the
same primary key, Reg Num, we would not divide in this example. By the way, if
we did divide, then any process given only Reg Num as input data must perform a
preliminary enquiry to find out which subclass to access.
This means specifying the superclass and each subclass as distinct
classes, connected by ‘is a’ relationships. In this entity model the boxes are
things that are uniquely identifiable one from another by a combination of key
and ‘type’.
Fig. 5h
Representing
the subclasses of a class
Given four kinds of entity model and three ways to
specify a class hierarchy as classes (aggregation, pseudo-inheritance and
delegation), which way suits which kind?
Drawing
class hierarchies in an informal entity model
Since the informal entity model is only an informal
picture, it doesn’t matter which way you choose to draw it. However, delegation
is common. People like to draw the subclasses as distinct boxes in an informal
entity model because it helps them analyse the problem domain, even if they
later decide to aggregate the subclasses, or use pseudo-inheritance.
Drawing
class hierarchies in a relational entity model
Rule-of-thumb: if the users recognise objects of two
subclasses by the same primary key use aggregation; if not use pseudo-inheritance.
We aggregate in our example, because all kinds of Vehicle are identified by the
same primary key, Reg Num.
Drawing
class hierarchies in an entity state machine model
The structure clash between subclasses and parallel aspects needs further
exploration. The difficulty is outlined here.
Before you can draw a box in an entity model for
each state machine, you have to understand the right way to build state
machines. You must somehow document the constraint that the state machines of
the subclasses are mutually exclusive. The way to do this is by drawing a
high-level selection between options, where each option represents a subclass.
Aggregation is wrong. It means drawing only one state machine for the
class hierarchy, including the behaviour of all subclasses in it, duplicating
the superclass behaviour under each subclass option.
Pseudo-inheritance is wrong. It means drawing state machines only for
the subclasses, duplicating the behaviour of the superclass in each. This means
there is no specification in the state machines of the mutual exclusion between
subclasses. You must prefix some events by an enquiry to work out which class
they affect. Given an event (Scrap Vehicle) that carries only the primary key
of the object (not its subclass) there is no way of knowing to which class the
event must be directed. The corollary is you cannot rely on generating all the
event preconditions from the state machine documentation.
Drawing
class hierarchies in a data storage structure
Aggregation
of subclasses into one relation is normal. The subclass’s data attributes are
contenders for the same data storage space. Aggregation also sits happily with
the view taken in behaviour analysis.
A parallel aspect of a class (or as some object-oriented authors say ‘non-disjoint subclass’) is an independent group of attributes and relationships, whose behaviour is governed by a single state variable.
Behaviour analysis (and structure clash resolution) tends to lead you to
divide an aggregate relation into its component parallel aspects. You may go as
far as to decompose the behaviour of a relation into one parallel state machine
for each attribute and relationship. However, a group of attributes and
relationships that share the same state-transitions are normally lumped
together in one state machine, making it a very low-level aggregate class.
Since aggregation rolls up all parallel aspects into one class, the picture of the case study here would be the same as that for aggregation of a class hierarchy shown earlier.
The picture here is different from partition of a class hierarchy shown
earlier. There is a structure clash between subclasses and parallel aspects.
Dividing parallel aspects leads to this entity model.
Fig. 5i
There is only a subtle difference between delegation and partition of
parallel aspects.
In delegation, one of the parallel aspects is appointed as the ‘basic
aspect’. E.g. the basic aspects in the example are Owner-basic and
Vehicle-general. The ‘basic’ aspect can be thought as being at a higher level
and owning all the others. This means we can connect all the other aspects to
the basic aspect by association relationships.
Given four kinds of entity model (informal, relational, application and data storage) and three ways to specify parallel aspects as classes (aggregation, partition and delegation), which way suits which kind?
Drawing
parallel aspects in an informal entity model
Aggregation
is normal. People rarely draw one box for each parallel aspect in drawing an
informal entity model. As far as code specification is concerned, it doesn’t
matter which way you choose to draw the informal entity model.
Drawing
parallel aspects in a relational entity model
Aggregation
is normal. Relational theory doesn’t really account for parallel aspects, but
one might say it assumes aggregation of parallel aspects into one relation.
Drawing
parallel aspects in an entity state machine model
Partition
seems the natural thing, since each box is supported by a distinct state
machine. It enables The entity state machine model to be used as a map or
graphical menu for navigating to the state machines.
Strictly, delegation is the right approach. One
of the parallel aspects must be appointed as the ‘basic’ aspect, which can be thought
as being at a higher level and owning all the others. The state machine of this
basic aspect will normally be responsible for all different varieties of birth
and death events, but may pass these on in the form of a ‘superevent’ to the
other parallel aspects.
After aggregating the class hierarchy and delegating the parallel aspects in our example, the result is an entity state machine model that matches the event models. The boxes in it are state machine. The boxes are the classes for which state machines are constructed and that appear as distinct components in event models.
Drawing
parallel aspects in a data storage structure
Partition
is simplest. This sits happily with the view taken in behaviour analysis and
facilitates the distribution of parallel aspects to different physical
locations.
Aggregation
of parallel aspects into one relation should reduce access times, but requires
a little extra work in the data abstraction layer (components that retrieve logical
application objects from the Data services layer, and restore them).
What is an entity or a class? What does a box in an entity model represent? Different answers lead to different entity models (ignoring differences between notations). The picture is further complicated by different ways (aggregation, partition and delegation) to specify the ‘subclasses’ and ‘parallel aspects’ of classes.
Given there are many ways to draw an entity model, a methodology should
help us to decide which way suits which purpose. Which way of drawing an entity
model best suits database design? Which way suits object-oriented modelling? We
need a methodology which disentangles the various questions involved and
provide some answers.
The methodology implied by clashing entity models can be organised into
a handful of major activities, where a fair amount of parallel activity is
possible. Fig. 5j gives an idea of what we mean.
Fig. 5j
There is a progression from informal to formal, only two models appear
in the implemented code - The entity state machine model and the data storage
structure. In simple systems these will be the same. In very simple systems
they will both be the same as the relational entity model. See ‘The OPEN book’
for further discussion of the methodology in Fig. 5j.
Coordinating
software architecture
In non-trivial enterprise applications, rather than select one or other
logical view as the basis of encapsulation, we want to have it all ways.
Current object-oriented ideas are inadequate; we need a richer theory of system
specification. We need a software architecture that results in separates
components handling each logical view, and separate components addressing the
physical concern of designing efficient database tables. Our 3-tier software
architecture is designed for this purpose.
You can specify the data attributes of any entity model box using an
underlying data dictionary. You can specify one-to-one correspondence between
models by names. A CASE tool can help as discussed below. The earlier sections
of this chapter show how you can make things easier by taking design decisions
that align different versions of the entity model.
Using object-oriented ideas to specify the Business services layer does
not mean you have to use object-oriented software tools implement the
application. We do need some kind of:
· GUI management software to implement the UI layer
· Common programming language to implement the business services layer
· Database management system to implement the data services layer
We need a database management system that supports the notion of a
commit unit, handles the back-out of updates to objects within a commit unit,
and helps with locking and logging of transactions. It must have an efficient
way of reading and writing the instance variables of not only of single object
but of aggregates of related objects (say, all the details of a master).
CASE
tool implications
An upper CASE tool could allow us to draw four models:
Kind of data
structure |
Each box connected
to a |
informal entity model |
- |
relational entity model |
data group |
entity state machine model |
state machine and data group |
data storage structure |
physical database table. |
Copy and paste functions will enable us to copy all or Component of one
model into another. Ideally we’d like to draw only one ‘master’ relational
model, then draw parallel versions only of the Components that differ in the
‘subordinate’ entity state machine model (where parallel aspects are separated)
or data storage structure (say, where subclasses might be divided). Working out
a pleasing way to cross-refer between the master and subordinate diagrams would
be a challenge.
Copy
and paste
The tool should allow you to draw multiple versions of a data structure.
It should help you to duplicate whole entity models and copy and paste partial
entity models between them.
Cross-reference
by name
The tool should connect any entity model box to related documentation
items by recognising the names (or possibly synonyms) of corresponding classes.
The name of a box in an entity state machine model must match the name of a
state machine. The name of a box in the data storage structure must match the
name of a database table in the lower CASE tool. The name of an attribute must
match the name of a domain in the dictionary.
Domain
dictionary
The tool should help you to specify the data group behind any box in any
entity model as a list of domains drawn from an underlying domain dictionary.
This central repository should be independent of the internal data storage
structure or the external user interface. Most current implementation tools tie
variable definition too closely to one or the other.
Cross-layer
class specification
A single class may have properties in each layer, a data storage format,
a presentation or display format, and some application rules. Suppose the user
asks for a class from one system to be added to another. We’d like to reuse the
class without defining new data storage and presentation formats.
We need a way of defining an entity class and attaching to it the
baggage of its data storage and presentation views, so that these can travel
with the entity class. We can envisage this working at the atomic or data item
level of system specification. How to handle larger classes is an open issue.
On
domains in enterprise applications
James Odell (1994) says: ‘Existing entity modelling
techniques have deficiencies for object-oriented analysis. To assist the
object-oriented designer, the object-oriented analyst must specify all
object types and associations clearly.’ By ‘all’, he means to nag the analysts
into adding the basic data types and domains as classes into their entity
models. But they don’t want to do this!
Analysts don’t usually have too worry much about
domains. True, specifying the individual variables in a system specification is
important, often difficult and always time-consuming. But specifying the domain
of each variable is not that difficult. It can be postponed until near the end
of design.
Every database builder knows they must define the domain of each variable. Most are happy to leave it until the analyst has completed something like a relational entity model. You need to get an overview of the system structure before defining its details.
When it comes to specifying the domain classes in enterprise
application, a simple two-level structure, generic and applicaion-specific, is
usually enough.
Generic
domain classes
At the bottom level of specification you may use a
few generic domain classes, such as text, number and date formats. These are
only a minor concern; you probably need only the half a dozen or so data types
provided by your chosen implementation technology.
Application-specific
domains
At the next higher level above generic domain classes, there may be tens
or hundreds of application-specific domains, such as ‘Phone-Num’ and ‘Country’.
There are two ways to specify these system specific domains, at the bottom and
at the top of the specification.
You may specify the domain in a dictionary, to be reused in defining the
attributes of classes. You might define a dictionary entry Phone-Num as being
of the generic domain class Number and always beginning with 0, then specify
the classes Supplier and Customers as having attributes called
Supplier-Telephone and Customer-Telephone, both with the properties of the
application-specific domain called Phone Num.
Note that naming conventions must be agreed. If you name two or more
attributes directly after their domain, then wherever the context is ambiguous,
say in an input message, you will need to declare the context somehow, say
Supplier/Phone-Num or perhaps Phone-Num (of Customer). For this reason, people
tend to compose an attribute name by combining the class name and domain name.
Or else you can turn the specification on its head
by declaring the domain as a high-level master class (say Country) in the
entity model, and then specify different classes owning this attribute (say
Customer and Supplier) as detail instances of the master class. Thus, you can
invert any
attribute variable, or rather its domain, to become a class whose key is one
valid value of the domain.
An age-old question of database design is: should we do this? Should we
show the application-specific domains as classes in an entity model? We
addressed this question in ‘Introduction to rules and patterns’.
On
classes as state machines
The
input data structure of a class
The
state vector of a class
To resolve an ‘interleaving clash’, you have to separate the
‘multi-threaded’ process from its the state vector. You keep one copy of the
process (one for the class), but many copies of the state-vector (one for each
object). Thus,
The
state variable of a class
In object-oriented terms, the importance of the object’s location
counter or state variable is that you can use it in evaluating rules. If an
event finds an object in the wrong state, then that event must fail. Following
our application modelling techniques, rules are coded in the form of event
preconditions.
Object-based
versus object-oriented
Class
hierarchies and aggregates
Fig. 5k shows School as an aggregate in which the
basic aspect of the class has parallel aspects that are mutually exclusive. The
relationships from the basic aspect to the parallel aspects are crossed by an
exclusion arc. The entity model then looks like a class hierarchy, but
the lines between boxes are ‘association’ relationships rather than ‘is a’
relationships.
Fig. 5k
Head Teacher and Principal Governor are disjoint ‘aspects’ rather than
disjoint ‘subclasses’. Is this merely sophistry? Does it makes a difference to
the state machines?
Berrisford and Burrows (1994) showed that to express the disjointness of
subclasses in state machines, you have to draw the state machine of each
subclass as an option within a state machine of the superclass. So the life of
a subclass may be completely rolled up into the life of its superclass.
At first sight (we haven’t explored enough examples yet), it seems like
the same principle applies to an aggregate of disjoint aspects as to a class
hierarchy of disjoint subclasses. If so, then we would say the distinction is
sophistry.
Glossary
Aggregation:
grouping the properties of subclasses or aspects into one high-level superclass
or aggregate class.
Aspect:
an independent role of a class, a group of attributes whose updating is
constrained by one state variable, a class whose behaviour is representable as
a single finite-state machine.
Class:
a set of object instances that share the same properties.
Class
hierarchy: a structure dividing a superclass into subclasses.
Delegation
*: dividing the properties of a class between a high-level class and low-level
classes connected to it.
Event:
an atomic transaction, a minimum unit of consistent change, transient but
leaving a mark on persistent objects.
Event
model: a specification that shows how one or more objects
are affected by a single event, and the constraints that must be tested.
Object:
something that persists and must be remembered.
Partition
*: dividing the properties of one class between smaller roles or aspects.
Pseudo-inheritance:
copying the properties of a superclass into its subclasses.
* The difference between Partition of parallel
aspects and Delegation of parallel aspects is not obvious. Both mean
separating a class into parallel aspects. But Delegation implies one of
the parallel aspects is appointed as the ‘basic aspect’ that is at a higher
level and owns all the others. You can think of the ‘basic aspect’ as the creator
and destroyer of the object identity, simultaneously creating and destroying
all related aspects.
This chapter discusses design issues and tradeoffs. It shows how the separating
the application and Data services layers of 3-tier architecture can help you to
hide data replication and aggregation from the Business services layer of code,
and minimise data migration difficulties.
Redundant data makes one object dependent on another, so if you update one object, you are obliged to update another at the same time. But redundant data is not necessarily a bad thing. You have to consider a design tradeoff, and the kind of redundancy that is involved.
Tradeoff:
enquiry process v. update process
The speed of a process is largely determined by the
number of discrete objects it accesses. Reducing the objects accessed on update
may increase the objects accessed on enquiry, and vice-versa, so you cannot
optimise both updates and enquiries.
Similarly, the simplicity
of a process is largely determined by the number of classes it accesses.
Reducing the classes accessed on update may increase the classes accessed on
enquiry, and vice-versa, so you cannot simplify both updates and enquiries.
An aim of relational data analysis is to simplify programming, to
prevent programmers from writing unnecessary code. It achieves this by reducing
data replication. E.g. you would normalise the
Fig. 6a
Thus, you prevent programmers from having to locate and update all the
Sales of a Stock, in order to update a Stock Description. This has the added
benefit of reducing the danger of inconsistent Stock Descriptions being stored,
through the update process not being completed properly (whether this is a
failure of the programmer or the technology).
People sometimes teach relational data analysis as though its aim is to
remove all redundant data. First, this is a means not an end. Second, there are
two kinds of redundant data - replicated data and derived data - and they have
different implications.
Replicated
data
Replicated data occurs where one piece of information is repeated. This
is not necessarily a bad thing. You may choose to replicate data to speed up or
simplify enquiry processes. Typically, you might repeat an attribute of a
master object in every one of its detail objects.
E.g. you might store Stock Description as an attribute of
If you do replicate data, it is wise to maintain the original data as
well as its copies. So you should maintain Stock as well as
Derived
data
Derived data occurs where several pieces of information are summarised
in one place as the result of a calculation or procedure. The usual example is
a total stored in a master object of detail objects. E.g. you might store a
summary total of Sales in a Stock object, to save adding up this total on each
enquiry.
Another kind of derived data is a derivable sorting class. E.g.
‘Customer Interest in Stock’ is a derivable sorting class that clusters all the
Sales for a given combination of Customer and Stock.
Fig. 6b
Removing derived data can frustrate the aim to simplify programming.
Suppose you omit the sorting class from the data structure. Programmers will
have to sort Sales by Customer within Stock, or Stock within Customer, every
time they want to display them in a structured list. In effect, they
manufacture a ‘soft’ instance of the sorting class every time they need one.
In general, analysts can make extra work for programmers. Missing
derivable classes and relationships from the data structure can make
programming unnecessarily complex. Programmers end up defining the missing
classes and relationships in program code. And they may have to do this lots of
times, in many different programs.
Given the tradeoff between defining a ‘hard class’ in the structure of
the persistent data, and defining a ‘soft class’ in one or more transient
processes that operate on the data structure, the balance lies in favour of the
former. As a rule:
Specify
classes and relationships in the data structure, rather than leave them to be
constructed by programs.
Benefits: simpler enquiry programming and easier program maintenance.
True, whenever a class is amended you will have to amend all the programs which
refer to that the class, but this is the case whether the class is hard or
soft. And there will simply be less program code to maintain if the class is a
hard one.
Costs: some extra update processing, extra data structure maintenance
and data migration costs. If you map every entity class onto a database table,
then you will have to ‘migrate’ persistent data from one structure to another
whenever a class is amended.
Benefits
without costs?
How to get the benefit of an application-specific entity model that makes
application programming easy, while at the same time using a data storage
structure that speeds up enquiry processes, increases the robustness of
distributed operations, and facilitates maintenance without data migration? The
3-tier architecture opens up the interesting possibility that you might define
different data structures for:
• Business services layer - entity state machine model designed to
simplify processing
• Data services layer - data storage structure designed for performance
and flexibility.
Most database designers reproduce the ‘logical’ entity state machine
model as closely as possible in the data storage structure. This is how most
systems are built. But you might take a very different approach in designing a
large enterprise application. You can write application programs to operate on
The entity state machine model , while storing instance data in a
differently-structured data storage structure.
Replicated
data belongs in the Data services layer
We propose that replicated data belongs in the Data services layer, not
in the Business services layer
The idea is that you can specify and code the Business services layer as
though no data is replicated, hiding all replication in the Data services
layer.
E.g. the application program that updates a Stock Description will
assume it is stored only in a Stock object; it will call the data abstraction
layer; this will find all the places where the Stock description has been
replicated and update all of them. So the application program is entirely
unaware of how far data has been replicated. The data abstraction layer handles
the extra complexity. You may even be able to a buy a distributed database
management system that does the job of the data abstraction layer for you.
Derived
data belongs in the Business services layer
Perverse though it may seem, we propose that derived data belongs in the
Business services layer, not in the Data services layer.
It is nonsensical to hide derived data in the Data services layer only.
It would be foolish to code an enquiry in the Business services layer to report
the total Sales of a Stock by adding up the total, if the total has already
been calculated and stored in the database. Likewise, it would be foolish to
code complex enquiry processes in the Business services layer as though a
sorting class does not exist, if it does exist in the Data services layer.
Doing it the other way around is far more reasonable. You can specify and code simple enquiry processes in the Business services layer as though a derivable total or sorting object has been stored. You may then choose to store the derived object in the Data services layer, or else the data abstraction layer can derive it and present it to the Business services layer whenever it is required.
Conclusions:
Don’t store redundant data until you have established a clear business case in
terms of speeding up enquiries or increasing the robustness of local operation.
Specify ‘replicated data’ in the Data services layer of code. Specify ‘derived
data’ in the Business services layer of code. Yes, this does mean the
conceptual model of the Business services layer is influenced by physical
design considerations, but the alternative is ludicrous.
A single central database is the simplest option from the design point
of view. The motivation for distributing subsets of a database around the nodes
of a network is to enhance the performance or robustness of local processing at
a node. This may involve replicating data at different locations.
If you define different data structures for the
Business services layer and Data services layer, then you can hide all data
distribution decisions and complications in the Data services layer. You may
the annotate the data storage structure with distribution details, leaving The
entity state machine model untouched.
When it comes to distributing objects, the classes in the data storage
structure might be divided into three kinds.
Objects
that sit naturally at location
Locations where a business wants to store data often appear in the model
as classes (department, warehouse, local office, or whatever). The natural
scheme is store an object of a such a class at its real-world business
location. Some details fall naturally under these locations.
Fig. 6c
Not every object is naturally related to only one location. A customers
may be the recipient of Sales from several Warehouses. You might begin by
assuming that all multi-location objects objects are stored at a central server
location.
Detail
objects that link objects in different locations
You might choose to store a
Fig. 6d
Or you might choose to store Sales in a distinct storage location,
separate from both Customer or Stock. Either way, distributed locations are
connected along a one-to-many relationship. Managing a one-to-many association
between distributed objects can be difficult. So you might instead choose to
replicate a
Fig. 6e
This has the advantage of connecting locations along a one-to-one
relationship, which is simpler to manage. Of course, if there are further
detail classes connected to a
Master
objects that are used in several business locations
Some master objects (like ‘Currency Conversion Rate’ or ‘Customer’ in
our example) can appear at several business locations. You might choose to
store these objects only once, in a central server or head office storage
location. The problem is that local processing may be too slow, or if the
network goes down, people cannot carry on working on their local office database.
A way around this is to unnormalise data and copy the master object into
several locations, so user have all the information they need close at hand.
You might repeat the Customer name in every one of their Sales records.
Or more openly, you might copy each Customer object in all business locations.
Fig. 6f
You should not eliminate the
original master object. One object (in a master location or a distinct server
location) has to keep track of all the places where the object has been copied,
for the purpose of broadcasting updates.
So in short, distribution means you may have to:
· select one business location for objects that naturally relate to more than one business location
· define distinct storage locations other than natural business locations
· divide one class into two parallel aspects connected by a one-to-one relationship
· divide one
object into one master object owning many copies.
Tradeoff:
robustness v. inconsistency
Where a single database is partitioned and stored at several locations,
the issue of robustness arises. If the network fails, you want to carry on
working at one database location while not connected to the others.
To increase robustness, you will tend to replicate data at different
locations. But this means that there is the danger of data in different
locations getting out of step, whether due to sloppy design by or failure of
the network technology. What if somebody updates, or worse deletes, a Customer
object on one of the databases while the network is down? The various databases
will get out of step.
Getting the databases back in step can take a great deal of effort. It
is not just a question of running automatic update programs. While the network
is down, you might accept Orders at a Warehouse for a Customer that has been
deleted or black-listed at head office.
When you find out later that the Customer has been black-listed: Should
you now reject these Orders? Or should you find some other Customer to take
them? These are questions that the business analyst must address rather than
the database designer.
Data replication and data distribution are two good reasons to design a data structure for the Data services layer that is different from the data structure of the Business services layer. Data migration may be another reason.
Programs are transient. Data is persistent. So changing a data storage
structure involves an extra step, called ‘data migration’, that changing a
program does not. You have to reorganise already-stored data, shifting it from
one version of the data storage structure to the next.
The more you specify application-specific classes and relationships in
the data storage structure, the greater the data migration cost whenever these
classes or relationships change.
This is not necessarily a bad thing. Remember, the rule to specify
classes and relationships in the data structure rather than leave them to be
constructed by programs. What the database designer misses out, the programmers
will have to put in, tenfold. And if data migration is needed because you are
correcting a poor data storage structure, inserting classes or relationships
you overlooked, then you have only yourself to blame.
Conclusion: expect data migration and include it your plans.
Nevertheless, there are some very large databases where data migration
is just too expensive. Is there an alternative design for maintenance strategy
that will reduce or eliminate data migration?
Avoiding
the cost of data migration
Can you have it both ways? Can you have both the specificity of The entity
state machine model , and the flexibility of a data storage structure that does
not require amendment when The entity state machine model is altered?
Again, yes you can. You can write application programs for the classes
in The entity state machine model , and store instance data in different and
more generic structure in the data storage structure.
Fig. 6g shows an extreme example. The structure on the right is
generalised so far that no conceivable application amendment would require it
to change.
Fig. 6g
How does this work? You code the entity classes and relationships in the
Business services layer. You code the data storage structure in the Data
services layer. You design an data abstraction layer to translate between the
entity classes and relationships and the data storage classes and
relationships.
When your application program wants the instance data of a specific
Customer, it does not read the data storage structure but calls the data
abstraction layer. How the data abstraction layer assembles the Customer object
from the data in the data storage structure is a matter only for the data
abstraction layer.
When The entity state machine model is altered, you have to amend the
application programs, you have to amend the data abstraction layer, but you do
not have to restructure the data storage structure or carry out a data
migration exercise.
Conclusion: where it is justified (by data migration or performance
costs) introduce an data abstraction layer to separate The entity state machine
model from the data storage structure (designed for flexibility and
performance).
The art of system design is to find the best balance between conflicting
objectives. Many authors have listed general objectives for system design. Some
have suggested ways of measuring how far these objectives are achieved.
Relatively few have focussed on the tradeoffs between objectives.
The optimum balance between conflicting objectives will differ from
system to system. We have been making generalisations about tradeoffs in the
kind of system we are most interested in - enterprise applications. Here are
some more tradeoffs to finish with.
Efficiency:
size v. speed
You might reduce the amount of code in a monolithic program by removing
a repeated block of code into a reusable subroutine. But this will tend to slow
the program down.
You might provide a faster alternative algorithm for a given process.
For example, you might design a faster text printing algorithm that produces
only rough or draft quality print. But this will increase the amount of code in
the system.
Object-oriented programmers often do provide
alternative algorithms for a single process. The substitution of one algorithm
by another is recognised by Gamma et al in the form of a design pattern called
‘Template’.
The substitution of one step in an algorithm by another is recognised by Gamma
et al in the form of a design pattern called ‘Strategy’.
Yet in the Business services layer of an enterprise application, you
virtually never provide alternative algorithms for one process. In fact, it is
not worth worrying about processing speed at all. The speed of an enterprise
application is completely dominated by the time taken to store and retrieve
data. Efficiency lies in the hands of the database designer.
In speeding up data access, a database designer will tend to increase
the backing store needed to hold the database. The designer will allow more
space for a data group to fit on the page of the database it is placed on, so
it doesn’t overflow that page. The designer will allow more space for storing
relationships, space for extra pointers and extra indexes.
Conclusion: buy much more database space than you think you will need.
Database
accessibility: crude locking v. concurrent usage
While it is running, a database update process has to lock the entities
it is working on so that no other process can alter them. A crude locking
mechanism will lock the whole database, or a large area of it. The ideal
locking mechanism will lock only the objects actually updated by the process
If there are many concurrent users of the system, a crude locking
mechanism can dramatically degrade the system’s performance. To speed up the
system, you will need a more sophisticated locking mechanism that works at a
lower level of granularity.
Conclusion: refine the locking mechanism in proportion to the number of
concurrent users.
Database
enquiry speed: aggregation v. flexibility
To speed up a specific enquiry or display you may store all the data you
want for that enquiry in one large object. The price you pay is inflexibility
and disoptimisation from another enquiry perspective.
For example, if you store all of a Customer’s Orders within the Customer
object, then you can easily and swiftly assemble the list of Customer’s Orders
for display.
You might call this an aggregate entity state record, or an unnormalised
object. Calling it a ‘real-world’ object is nonsense. An aggregate entity state
record is no more a real-world object than a third normal form relation is a
real-world object, it’s just data storage that’s optimised from one
perspective, usually for output display.
Such optimisation makes the system less flexible, less suited to
processing from another perspective. For example, you cannot so easily list all
the Orders placed for a specific Stock Type.
(By the way, some of the things people say about how much better an
object-oriented database is than a relational database are the same things
network database designers have been doing for twenty years to optimise
performance. To speed up access - store pointers to the detail objects along
with the master object. To save space - roll up detail objects into one or
other master object, making an aggregate entity state record. These are matters
for the Data services layer, nothing to do with defining the Business services
layer.)
Conclusion: don’t unnormalise stored data into an aggregate entity state
record until you have established a clear business case in terms of enquiry
speed, and define aggregate tables in the data storage structure rather than in
the entity state machine model.
Cost
of usage v. cost of design
Making the users work at the user interface easier takes more design
effort. Conclusion: spend money on usability in proportion to the number of
end-users who will benefit from your design efforts.
Breadth
v. focus
Users want a system that does the job, no more, and operates
efficiently. If you give users more than they ask for, you may end up obscuring
the main functions behind features people never use, making the system harder
to use, and slowing it down.
(Perhaps you discovered this from a user’s perspective when you last
upgraded your word processor to the latest version.)
Worse, features that are never used tend to fall into a state of
disrepair and decay. Since nobody cares about them, you can be pretty sure that
they won’t work very well if somebody wants to use them in the future.
Conclusion: don’t implement more features than you are asked to, but
don’t let this stop you thinking ahead and designing for maintenance.
Complexity:
component size v. component interaction
Designing a large component or module takes a long time. A large
component is harder to understand, test and maintain. Most people recommend you
decompose a system into small self-contained components. Indeed, this is a
mantra of object-orientation.
The trouble with replacing a large component by smaller ones is that
they must talk to each other. There is more interaction between components than
before. You have to concentrate more on the interfaces between components.
Message-passing becomes more of a design issue. You replace one kind of
complexity (proportional to component size), by another kind of complexity
(proportional to component interactions).
(There
is a more obscure difficulty with defining many small object-oriented
components or classes. Where not all the effects of one event type appear in
one class, you may have to add an extra ‘gatekeeper’ class to sit on the path
of an event type, whose only job is to decide whether to let an event instance
through to a related object or not.)
Conclusion: when you partition a system into smaller classes, expect to
increase the effort you apply to Event Modelling.
We’ve discussed design issues and tradeoffs. We’ve shown how the 3-tier architecture can be used to minimise data migration costs, and hide data replication and aggregation from the Business services layer of code.
In summarising the conclusions of this chapter, we can list a dozen
design or so principles for large systems.
• specify classes and relationships in the data structure rather than
leave them to be constructed by programmers
• where it is justified (by data migration or performance costs)
introduce an data abstraction layer to separate the entity model from the data
storage structure (designed for flexibility and performance)
• don’t store redundant data until you have established a clear business
case in terms of speeding up enquiries or increasing the robustness of local
operation
• specify ‘replicated data’ in the data services layer of code
• specify ‘derived data’ in the business services layer of code
• expect data migration and include it your plans
• buy much more database space than you think you will need
• refine the locking mechanism in proportion to the number of concurrent
users
• don’t unnormalise stored data into an aggregate table until you have
established a clear business case in terms of enquiry speed
• define aggregate tables in the data storage structure rather than the
entity model
• spend money on usability in proportion to the number of end-users who
will benefit from your design efforts
• don’t implement more features than you are asked to, but don’t let
this stop you thinking ahead and designing for maintenance
• when you partition a system into smaller classes,
expect to increase the effort you apply to Event Modelling.
Analysts need what might be called ‘analysis patterns’.
These will be similar to design patterns for object-oriented programming in
some ways, but different in other ways. This chapter focuses on a pattern they
call State.
The footnotes mention also Composition, Decorator,
Facade,
Adapter,
Bridge
and Proxy.
Design patterns have been developed by and for object-oriented programmers. The usual reference is ‘Design Patterns: Elements of Reusable Object-Oriented Software’ by Gamma et al.
Gamma et al. are widely and affectionately known as the Gang of Four.
Their work is rightly acclaimed; it is an example to those teaching analysis of
how to teach expertise (not just notations) via patterns.
Are design patterns relevant to Analysts and designers?
Analysts
need patterns for processing persistent data
Most object-oriented designers work on systems that process transient
objects; for example, compilers, graphical interfaces and financial modelling
systems. So naturally, design patterns are mainly concerned with transient
objects.
The data in a business database is composed of entity state records that
represent real-world entities, long-lived entities that the business seeks to
monitor and perhaps control. So analysis patterns must apply to persistent
entities.
Fig. a repeats from chapter 1 a scale from transient objects to
persistent objects. This is very closely related to the scale from type to
state. The longer objects persist, the more that apparently fixed types become
variable attributes or transient states.
Fig. a
It turns out that the persistence of data has a big influence on
patterns for software design, as you shall see. You need a theory for how to
manage states as well as types. Traditionally, different modelling theories
have been applied to modelling types and states. One of our aims is to combine
these theories.
Analysts
need patterns that prompt questions
The Gang of Four say ‘Design patterns solve many of the day-to-day
problems object-oriented designers face.’ Each design pattern fits to a given
problem. You use a design pattern to solve a problem you already know you have.
Analysts need help with analysis, to discover what the problem is. A
analysis pattern should help analysts to ask questions and find things out. It
should help you to test and uncover problems in an existing specification. The
most cost-effective training involves teaching bad patterns as well as good
ones.
Analysts
need patterns to do with real-world objects
Design patterns help designers to sort out computer-world objects.
Analysts need to sort what things in the real world have to be represented in
the system. Analysis patterns must help analysts to investigate the rules and
practices of an enterprise in the real world, the one that is to be supported
by the enterprise application.
Analysis patterns must be concerned with eternal verities in the way
that real people and real businesses behave. At least, those eternal verities
that can be captured in a ‘conceptual model’ of business objects and coded in
the ‘business services layer’ of a system. Analysis patterns will be used
mostly in defining the business services layer rather than the UI layer.
Analysts
need patterns that are logical
Design patterns are expressed in physical terms, in terms of
implementation mechanisms, and more specifically in terms of object-oriented
programming mechanisms.
Analysis patterns should be expressed in logical terms. They must define
characteristics of the problem domain rather than the implementation domain.
Analysts should be able to use them without knowing what technology will be
used to implement their design, be it C++, Java, COBOL or ORACLE.
For example, OO-style class diagrams specify where objects hold
references to other objects. Fig. b shows two class diagrams on the left that
are implementations of the same logical entity model on the right.
Fig. b
The logical notation above for modelling the
cardinality of a relationship between classes is well known. See the chapter
‘Rules and relationships’ in Analysis patterns.
Since the Gang of Four say ‘Almost all the [design patterns] use
inheritance to some extent’ let us begin by reviewing the idea of inheritance.
A class hierarchy or inheritance tree is a structure composed of superclasses and
subclasses, wherein a subclass can inherit or override the properties of a
superclass above it in the hierarchy.
object-oriented technologies help you achieve reuse
by applying inheritance and polymorphism to a class hierarchy. See the chapter
‘Class hierarchies and aggregates’ for more about inheritance and polymorphism.
The
general shape of a design pattern
Many of the Gang of Four’s design patterns are rather similar, based on a common template involving an abstract class, shown in Fig. c.
Fig. c
The ideas of patterns like this is to separate the interface of an
object or a process from various possible implementations of it. Thus, design
patterns of this shape capture expert knowledge about good uses for
polymorphism and abstract classes.
The Gang of Four again: ‘When inheritance is used
carefully (some will say properly), all classes derived from
an abstract class will share its interface. All subclasses will be subtypes of
the abstract class.’ See for example their design patterns: Iterator,
Observer
and Abstract
Factory.
Analysts
need few patterns that feature class hierarchies
There may be a few over-enthusiastic object-oriented designers who believe that good design means explicitly spelling out all the class hierarchies you can find in the entity model of a system.
Fig. d
Class hierarchies are common in some kinds of software design. But
chapters 5 and 6 have explained why you are unlikely to find so many in the
persistent data structures of enterprise applications. Even the Gang of Four
say ‘Designers overuse inheritance. Designs are often made more reusable and
simpler by depending more on object composition.’
Good analysts do not specify many class hierarchies in the entity model
that specifies the persistent data structure of an enterprise application.
Where the list of subclasses is very long, or variable, or there are complex
overlapping hierarchies; then defining class hierarchies creates schema
evolution problems.
Since analysts normally specify class hierarchies in other ways and
places, few analysis patterns will involve inheritance, and very few will
feature polymorphism, at least, not in the way that object-oriented designers
think of these things.
You might suppose then that analysts will find little use for design
patterns. But it turns out you can identify where some design patterns apply to
enterprise application design. And you can reshape some design patterns into
analysis patterns. We go on to reshape one design pattern for use by analysts,
replacing the class hierarchy with classes connected by one-to-many
relationships.
Design patterns might be divided into three groups:
• not very useful in enterprise applications
• useful to analysts in the business services layer
• useful to designers in the others layers or the interface between
layers.
The second group is the most interesting. The Gang
of Four define a pattern called State that is designed to ‘Allow
an object to alter its behaviour when its internal state changes. The object
will appear to change class.’ Let us look at how this design pattern can be
reshaped for analysts.
Our tiny case study features one object class, Person, and two event
classes, Employment and Death. The Employment event can only happen if the
Person is unemployed. The Death event has two effects depending on whether the
Person is employed or unemployed. Let us say Death (employed) goes on to affect
Employer.
Fig. e shows the State design pattern in the entity model. Person and
Person-Employment-Status are parallel associated objects. There is a class
hierarchy under Person-Employment-Status of subclasses Employed and Unemployed.
Fig. e
Messages to Person are delegated (by the implementations of the methods defined
in its interface) to Person-Employment-Status where appropriate, i.e. where the
response depends on the state.
You can use the State design pattern to implement one event that has
different effects on an object in different states. You code each event effect
as a distinct (polymorphic) method in a subclass of the status object. The
status object divides the event between event effects.
Fig. f illustrates that you code the Death event in the Employed class
and Unemployed class as two distinct methods. Personal-Employment-Status passes
the Death event down to the appropriate subclass.
Fig. f
You have to code the selection between subclasses somewhere - in a data
structure or a process structure. If you code it in the data structure, then an
object-oriented programming environment can make the selection between
subclasses ‘under the covers’ in any process that hits the status object. So
you don’t have to make the selection between types explicit in any process.
The
State design pattern as a way to avoid selections in processes?
You might use the State design pattern as a device to avoid coding a
selection or case statement within a method. You place the case statement in
the data structure and code each option as a distinct method in a distinct
class.
If the aim is to make code more maintainable, beware. First, what you
gain in one way you lose in another; it becomes harder to see which methods are
in fact related by mutual exclusion when an event is processed. Second, where
data persists, it is easier to change the structure of a transient process than
the structure of persistent data.
In enterprise applications, it is not reasonable or practical to remove
all case statements from methods. It is like trying to define all constraints
as state-transitions in state machines. This way of thinking, of trying to
design everything using only one tool, is a trap to be avoided.
There is one element of the State design pattern that is not so helpful to analysts - the class hierarchy showing each state as a subclass under each parallel aspect. Given that fixed class hierarchies do not abound in the data structures of enterprise applications, inheritance and polymorphism cannot be so useful as you might hope, and design patterns have to be reshaped for this kind of system.
However, there is another element of the State design pattern that
analysts can use. We have argued from around about 1980, and most recently in
the Computer Journal (1994), that a class is best divided into parallel aspects
along the lines of its need to maintain state variables.
A state variable is an attribute with a short range of values that is
tested as part of the precondition for one or more events. E.g. if a Person’s
Employment Status = employed, then an Employment event cannot happen. And if a
Person’s Employment Status = unemployed, then a Redundancy event cannot happen.
Ask of a class: Does it maintain a state variable? If yes, create a parallel class to
maintain it. Motivations include: keeping each class smaller and easier to
comprehend on its own; suiting the paradigm of object-oriented programming; and
tightly encapsulating the maintenance of a state variable.
This last means that the state machine for each
class can be described elegantly using a regular expression notation, and this
has further advantages in pattern recognition.
Where a class maintains several state variables, you should appoint a ‘basic aspect’ that is the master of all parallel aspects. Fig. g shows the basic class as the master of all the parallel status classes.
Fig. g
Fig. h shows a possible example. It has three cyclical states, each
varying independently. There is a ‘boundary clash’ between the cycles.
Fig. h
The basic class is responsible for maintaining object identity, and any
attributes that can change in an unconstrained way as long as the object
exists. The basic class so trivial it requires no state variable and there is
little value in modelling its behaviour in the form of a state machine; it
would be simply a sequence of creation, random updates, then deletion.
Some simple enterprise applications are composed of classes with only
basic aspects.
Rolling
up parallel aspects
In general, you should create a parallel class for each state variable
that has to be maintained. But Fig. i shows that in simple cases, you might
roll up one of the parallel aspects into the basic class. You don’t have to do
this, but it is a harmless way to condense the specification and code in simple
cases.
Fig. i
We have been talking about the specifying the business services layer
of a system. You need not separate parallel classes in the data services layer.
You can easily roll up all parallel aspects into one database table. One
benefit: this speeds up performance, since each process will have fewer data
objects to retrieve and restore. One cost: it makes the interface between the
business services and data services layers more complex.
Applying the pattern in section 7.4 to the case study, you would specify
a Person class that is careless of the state, and a Person-Employment-Status
class that flip-flops between employed and unemployed. All the processing that
depends on the state belongs in the Person-Employment-Status class.
Fig. j
You can specify the subclasses not in the data structure but in the
process structure of a Death event. Fig. k shows you specify the effect of
Death on Person Employment Status as a selection between options Death
(unemployed) and Death (employed).
Fig. k
The event model is an abstract specification. When you come to code it,
you might well code the selection between event effects as a case statement
within the transient method for the Death event, rather than in the persistent
data structure. This has advantages. In large enterprise applications, this
will help to reduce schema evolution problems, since you can change the
structure of a transient process more easily than the structure of persistent
data.
Some variations on this theme are shown below.
Status
cycle as a historical record
a cyclical state, do users want to remember
the history of past cycles?
If yes, you can introduce a one-to-many detail class.
Fig. l introduces a detail class called Job.
Fig. l
Status
as an optional detail
Fig. m shows that if you don’t want to remember the history of past
cycles, only the current one, you might remove the fork from the relationship
in Fig. l.
Fig. m
You don’t normally see this shape however, because designers normally
roll an optional aspect like this into its master class.
State
variable as a domain class
Fig. n shows you might add a domain class for the state variable
attribute, called Employment Status.
Fig. n
It is helpful to distinguish domain classes defined in the business
services layer (under end-user control) from classes defined in the UI or data
services layer (under designer control).
If designers want to define the values of a state variable in some kind
of table, perhaps along with an expanded description of the state that is
useful in error messages, you should define the domain class for the state
variable in either the UI or data services layer.
If users want to be able to change the description of a state
(‘unemployed’ to ‘redundant’), you may define the state variable as a state
class in the business services layer. But be careful not to expose the class’s
specification too far to manipulation by users; you surely don’t want users
creating or deleting states, and thus changing the rules of the application.
Domain classes are discussed further
in other volumes in this series.
Composition
defines an abstract class that provides a common interface for every level of a
hierarchical structure. It specifies the bottom ends of the hierarchy as a
special case. Curiously, it does not specify the top end as a special case,
though this is sometimes necessary.
Recursive composition is familiar to most database designers. When database designers specify fixed-depth recursion, a different pattern emerges, in which the top and bottom ends of the structure appear under parallel classes.
However, the recursive structures found in enterprise applications are
normally of variable depth; three varieties are possible.
The volume ‘Patterns in entity modelling’ says more about such recursive
patterns.
You can specify attributes as classes, then specify
a new thing as a subclass of each relevant attribute class, using inheritance
to obtain the attributes. But multiple inheritance can lead to complex
structures, difficult to manage. You cannot make schema changes, alter the
attributes of a class, without changing the data structure and losing the
instance data. To avoid these problems you can use the Wrapper
pattern to add properties to a basic thing, one on top of another.
You store new attributes as object instances without changing the data structure and thereby losing all the instance data. The price is that you hide the basic object beneath layers of attributes. When a wrapper is added, the object identifier appears to change.
Each wrapper completely encapsulates the original object and any
previously created wrapper. Each wrapper has a different object identifier. The
identity of the original object remains the same, but since an external client
can only call the outermost wrapper, the identifier appears to be that of the
last wrapper.
Client --> Wrapper3
--> Wrapper2 --> Wrapper1 --> Object
In fact, some calls are dealt with in the wrapper without forwarding, or
are supplemented before forwarding. This is the way the wrapper is able to add
extra functionality.
This kind of data structure is too inefficient for database designers.
Both multiple inheritance and recursive decoration are devices for
designer-maintained data rather than user-maintained data.
You can make schema changes, add new attributes to a relation, without
losing all the instance data; you can preserve the identity of objects stored
so far. But you do have to recompile the data structure, and probably some of
the programs, and retest the system.
The Gang of Four say ‘Each design pattern lets some aspect of the system
vary independently of other aspects, thereby making the system more robust to a
particular kind of change.’
Many design patterns are about decoupling servers from clients. They
help you to separate concerns for ease of maintenance, to keep distinct
subsystems apart yet also connect them.
Experts advise keeping the bridges between
subsystems as narrow as possible, keeping interfaces simple and economical.
This is very much the idea behind one of the Gang of Four’s design patterns
called ‘Facade’.
This and other design patterns can be useful in bridges between subsystems of
the 3-tier architecture.
Below, we’ve slightly edited and rearranged a contribution by Patrick
Logan to the patterns group on the internet, in which he suggests the
application of other design patterns to the 3-tier architecture:
‘Constraints
‘Despite the variation of user interfaces and databases, the system as a whole must maintain its integrity (adherence to system requirements). The logic and the system integrity checks represent most of the new development required.
‘The user interface tier should interact with the user, but refer to the middle tier (business logic and integrity) for the computation. The middle tier should be implemented in terms of abstract objects, hiding the business logic from the user interface, and from the details of the databases.
‘You can
separate the three tiers using the structural patterns described in Design
Patterns, such as Adapter, Bridge and Proxy.’
Analysis patterns will be about coherence and constraint, apply within a
coherent subsystem, within a layer of the 3-tier architecture, rather than
between them. Analysis patterns must apply within the business services layer
of code, help you to get the functionality of a system right.
Analysis patterns must help you integrate concerns,
help you to specify the coupling between business entities, to tighten the
constraints as far as possible, so that these objects remain consistent one
with another.
So broadly, one might say: ‘Apply design patterns to loosen the interfaces
between subsystems. Apply analysis patterns to discover and specify the
constraints within a subsystem.’
We started by suggesting analysts need what we are calling analysis patterns.
These will be similar to design patterns for object-oriented programming, but
different in a number of specific ways. We’ve already suggested that analysts
need:
• patterns for processing persistent data
• patterns that prompt questions
• patterns to do with real-world objects
• patterns that are logical
• few patterns that feature class hierarchies.
Both design and analysis patterns are concerned with smallish structures
of relationships between elementary components of a system. Analysis patterns
tend to be simpler than design patterns, more abstract in the sense of
technology-independent, and they are perhaps more numerous.
There are a few more things to say about the use of patterns in
analysis. Pointing up differences between design and analysis patterns sheds a
new light on both fields of research.
Analysts
need only a few patterns that feature recursion
Several of the published design patterns for object-oriented software
construction feature recursive communication between instances of a class. The
few analysis patterns that do feature recursion are interesting, but perhaps
not so commonly used. See Footnotes.
Analysts
need patterns that model business rules
Design patterns can help you build enterprise applications that are more
robust in the face of changes, while analysis patterns will help you build
enterprise applications that are correct in terms of applying constraints. Both
can help you make the step from naive database use towards more complex
database use. See Footnotes.
Analysts
need patterns for object behaviour analysis
Design patterns appear in two dimensions of conceptual modelling -
entity modelling and Event Modelling. Confusingly, the Gang of Four refer to
patterns in object interactions as ‘behavioural’ patterns. We use the word
‘behaviour’ in a different dimension.
What we call the object behaviour analysis face of the conceptual
modelling cube is to do with specifying the long-term behaviour of persistent
objects in the form of life histories or state machines. There are many
analysis patterns in state machines. This is an area in which analysis patterns
work might contribute to design patterns work.
Analysts
need patterns that suit database processing
Design patterns help with object-oriented programming technologies.
Analysis patterns must help with systems that use database and transaction
processing technologies.
But the distinction between technologies is not as fundamental as it
looks. Analysis patterns may be implemented in object-oriented software. Design
patterns may appear in enterprise applications.
Further
reading
The volume ‘Introduction to rules and patterns’ takes up the theme of analysis patterns, or analysis patterns.
This
book is largely practical. There are a few abstract principles that underlay
the discussion of patterns in this book and its companion.
A
system is composed of many small elementary things (objects, facts, types,
states, events and rules) connected together in various ways. You have to get
down to the bottom level. You have to define all the elementary things and the
relationships between them, at a level of description that can be executed on a
computer. There is no way to avoid this. There is no way to avoid the pain.
Everything
in a system must be connected to everything within that systems, otherwise
there must be two or more distinct systems. Patterns are about connecting
things. There are recognisable and reusable patterns in how things are related.
Patterns that connect just two or three things are the most reusable, but
patterns that connect four or five things are more valuable.
The
‘how manyness’ of things in relation to each other is a fundamental kind of
rule that has to be specified in each view of a system. You define one-to-one,
one-to-one-or-zero, and one-to-many relationships not only between classes (in
an entity model), but also between the concurrent objects affected by an event
at a moment in time (in an event model), and between the events that affect an
object over a period of time (in a state machine).
Things
are naturally related to each other by association. E.g. A shoulder is related
to an arm. An arm is in turn related to a hand. A husband is related to a wife.
A divorce must relate to a previous wedding.
Longevity
turns composition relationships into associations. A composition relationship is
an association, but strengthened by the rule that all the related objects live
the same length of time. You can try to relate things by saying one is composed
of others. E.g. A hand is composed of a palm, four fingers and a thumb. A car
is composed of an engine, chassis, body, etc.
This
is OK over short time, but longevity turns composition relationships into loose
associations. You might lose a finger from your hand, or replace the engine of
your car by another. You would better say a car is associated with a number of
parallel aspects, each of them potentially optional or replaceable.
Longevity
turns subtypes into states of parallel aspects. You can relate things by saying
one is a subtype of another. E.g. a Man is a Human; a Woman is also a Human.
Similarly, Leg, Arm, Wing, Flipper and Tentacle are all subtypes of Limb.
This
is OK over short time, but longevity turns apparently fixed types into
variables or states. Under some legal systems, a Human can change Sex. You
would better say a Human is associated with a number of parallel aspects - Sex,
Job, etc.
A
caterpillar turns into a butterfly. Exactly when in evolutionary history the
forelegs of a monkey became the arms of an ape is an interesting question. You
would do better to say a Limb has a number of optional parallel roles -
Supporter, Hanger, Swimmer, Flyer, etc.
Events
coordinate interacting objects. Object-orientation and event-orientation are
not in competition. They are orthogonal views of the same phenomena; equally
valid and useful views.
An
event reflects a natural phenomenon. An event model specifies the interactions
between concurrent objects in a formal way. An event model is a directed graph;
the event travels along each relationship in a one-way direction. But an event
model does not commit you to any statement about communication.
Messages
are an implementation device. You may select between a number of viable
message-passing strategies. You can choose to send messages along the paths
specified in the event model, or another route. The interaction is more
fundamental, more objective, than the messages that make it work.
Nature
abhors perfect symmetry. Assymetry tends to assert itself. If you discover two
perfectly symmetrical things, you will normally destroy the symmetry by placing
one over the other, or by inventing a third thing and relating both to it.
If you say my shoulder is related to my arm, which is in turn related to my hand, there is no need to say my shoulder is related to my hand - this is implied. But not all redundancy is bad, since introducing redundancy into one perspective may reduce redundancy in another.
Meyer discusses three technical advantages of the SmallTalk paradigm.
These benefits apply largely to designers working with visual programming
environments rather than business databases, and more to programmers than to
analysts.
Conceptual
consistency from using a single object-oriented paradigm
Only having to think in one object-oriented dimension is great for the
programmer, but teaching analysts that everything in a real-world enterprise is
an object doesn’t help them. We should teach and encourage analysts to consider
all dimensions of the problem they are studying. They need a framework that
clearly separates the different parts and orthogonal views of the problem
domain they have to analyse.
Manipulation
of classes at run time
This is great for the programmer, and perhaps for iterative prototyping,
but positively dangerous in full enterprise application development.
Soon after enterprise application is set live,
analysts are faced with the need to change the database structure or the rules
that guarantee data integrity while the running system retains its stored data.
Where run-time manipulation of rules is required,
analysts should define the rules as attribute values of some kind of
classification or rule entity type.
Where the business entity model is to change more
fundamentally, beware that the stored data is a valuable company asset. The
necessary reprogramming, retesting, retraining and data conversion are
expensive. Analysts need help to tackle such ‘schema evolution’ in a strictly
controlled and methodical way.
Use
of class-level methods alongside instance-level methods
Meyer suggests that programmers may find this a mixed blessing. Part of
the art is to keep levels of abstraction apart. Analysts have two orthogonal
ways to separate levels of abstraction.
Instance
from type
Analysts do separate type from instance in the business services layer
by one-to-many relationships between persistent classes: say:
Road
Type ---< Road ---< Road Use
Programmers may later introduce class-level ‘methods’ to process any
event that cascades down these one-to-many relationships.
Class
from metaclass
This is not a separation that Analysts worry about, but there is a sense
in which the three-tier software specification architecture separates class
from metaclass. It keeps apart:
• business services layer classes, such as Road and Road Use
• UI layer classes: such as Window and Button
• data services layer classes: such as Table and Commit Unit
Might one view the data services layer classes Table and Commit Unit as
metaclasses representing business entity and business event?
A tribute to the late Keith Robinson.
It is almost certainly true that the longest
continuous object-oriented research and development programme in the world was
started by Keith Robinson in 1977 at Infotech. After Keith’s death in 1993, the
development was carried forward by John Hall of Model Systems and I (Graham
Berrisford) who now work for Seer Technologies.
1977 Keith published a paper in
the Computer Journal proposing an object-oriented program design method for
database systems (not called that of course). Keith started from Michael
Jackson’s earlier suggestion that the variables and processes of each object
type could and should be encapsulated in a discrete processing module. An
additional idea was to use the state variable of an object in validation of
updates to that object.
1979 I helped Keith develop his
proposals into a 10-day course called 'Advanced System Design' based on three
techniques:
• Relational data analysis: Keith taught this as a technique to
decompose the required system inputs and outputs in what we might now call the
UI layer, into entity types for behaviour analysis in what we might now call
the business services or data services layer.
• Life history analysis: Keith taught this as a technique to discover
the behaviour of each entity type and document it in a state machine diagram.
He favoured using regular expressions as the notation and called them life
history diagrams after
• Object interaction diagrams: Keith invented and taught these to
document how objects exchange messages in order to complete the processing of
an event (one event may synchronously update several objects, and/or need to be
validated against the states of several objects).
Keith’s three-dimensional approach to conceptual modelling is now the
norm in modern development methods. But there was a lot more to his method than
notations, and some of the ideas he taught to do with schema evolution are
still ahead of the game.
By the way, many years before Yourdon abandoned data flow diagrams,
Keith advised against top-down decomposition.
1980 Keith’s course disappeared
when his employers went into liquidation. Not along after this, Keith helped
John Hall to develop an analysis and design method for the
Keith and John deemed object interaction diagrams impractical for use by
database programmers, but included life histories as an analysis tool for
discovering processes and business rules. They assumed it was obvious that each
life history or state machine could be transformed into a discrete program
module using
Unfortunately, version two of SSADM was developed by people who did not
understand that life histories were a program design technique. The ground that
was lost was not recovered for some years. And many still believe to this day
that the main program specification technique in SSADM is data flow diagrams!
1983 Keith invented 'effect
correspondence diagrams' (hereafter ‘event models’) to replace object
interaction diagrams. The former are simpler than the latter, but equally
formal. They suppress the detail of message-passing (which might be done in
various ways) but show the essential correspondence between ‘methods’ in
different objects affected by one event. The most wonderful feature of the
diagrams is that they transform equally well into either object-oriented or
procedural code.
1986 I tested event models with
Keith and John until all were confident they could be adopted by the
At the same time, Keith and I also proposed separating the business
services layer from the data services layer by means of a process-data
interface (perhaps coded as SQL views), so you can generate code directly from
the event models, careless of the database designer’s implementation decisions
or the database management system.
All these proposals were adopted by the
1991 Keith worked out a way to
detect and document reuse between events in state machine diagrams. The result
is a network in which events invoke superevents, which may invoke other
superevents and so on. This network can be generated by a CASE tool from the
state machines.
Keith knew then that SSADM had all the armoury required to be an
object-oriented method for database systems, save for two problems.
• To avoid the confusion that existed (and still exists) in
object-oriented methods between UI layer objects and business services layer
objects, designers needed to separate the layers of the 3-tier processing
architecture.
• The representation of inheritance in state machines needed further
research.
1993 Keith and I wrote the book
'Object-Oriented SSADM' (published after Keith’s death by Prentice Hall) mainly
to establish two ideas: the importance of separating the layers of the 3-tier
processing architecture, and the use of the superevent technique to maximise
economy and reuse of code within the business services layer.
1994 I published a paper in the
Computer Journal that showed how the benefits of inheritance (reuse and
extendibility) can be achieved through modelling state machines for the
'parallel aspects' of a class.
1995 John Hall did most of the
hard work necessary to test, demonstrate and establish the above ideas for
adoption by SSADM version 4.2.
1997 This
book has examined the practical application of inheritance and polymorphism in
enterprise applications. The companion volume ‘Event modelling for enterprise
applications’ introduces improvements in the teaching and usage of event
models, e.g. to include constraint discovery and specification.
References
Ref. 1: “Software is not Hardware” in the Library at http://avancier.co.uk
Footnote 1: Creative Commons
Attribution-No Derivative Works Licence 2.0
Attribution: You may copy,
distribute and display this copyrighted work only if you clearly credit
“Avancier Limited: http://avancier.co.uk”
before the start and include this footnote at the end.
No Derivative Works: You may
copy, distribute, display only complete and verbatim copies of this page, not
derivative works based upon it. For more information about the licence, see http://creativecommons.org