This booklet is published under
the terms of the licence summarized in footnote 1.
This part
catalogues patterns in the relationships between entities. It reveals the rules
of thumb and business analysis questions triggered by pattern recognition. It
continues from where Part one finishes, starting on ground that is relatively
firm and finishing with some more speculative suggestions.
Using patterns and model transformations
to get the constraints right.
This paper highlights the importance of the getting the constraints right. It introduces a catalogue of standard structural shapes and the notion of entity model transformations. These ideas are developed in later chapters.
The success of a system depends on the relationships between entities being specified correctly. If they are not, then it becomes:
· easier for useless data to get into the system
· harder for programmers to locate the information they need to find.
To prevent the above difficulties from occurring, to sharpen up the act of analysts, and to save time and effort, you can apply some simple quality assurance techniques. One technique is enquiry access path analysis. This means defining the route by which required information is extracted from the model.
Another is pattern analysis, which has the advantage that it
can be applied with less detailed knowledge of the required outputs.
Fortunately, there are recognisable patterns and questions that lead you to
transform an intuitive or poor design into a well-engineered design. These
patterns help you to raise the quality of analysis and design work, and thereby
improve the quality of the resulting systems.
The figure below shows a pattern called the double-V shape.
Ask of a double V shape: Can you tie an object of one detail entity to only one object of the other detail entity? If yes, connect the two detail entities by a relationship, to capture the constraint.
The basic pattern can be obscured by intermediate entities.
The figure below includes a double-V shape, even though the
Ask of this double V shape: Can you tie a Holiday Booking to only one Client Requirement?
Yes, a Holiday Booking is made to meet the Client
Requirement for the Feature Type that classifies the
There is a quality benefit. It is now impossible for users
to create a Holiday Booking for a Client who has not expressed an interest in
the Feature Type of the
There is a productivity benefit. Programmers do not have to navigate around the model to find the relevant Client Requirement for a Holiday Booking, or sort Holiday Bookings by Client Requirement within Client.
“I wonder if normalisation could or should lead to this result? It seems somehow similar to the transitive dependency issue, but only at an intuitive level.” Michael Zimmer
Perhaps, but you cannot rely on any one technique to reveal everything. See the Chapters on <Model transformations> for further examples.
Patterns occur in various kinds of specification, but among the simplest and most widely useful are those that involve the specification of relationships between entities. The figure below is an attempt to summarise and name the entity model shapes that I am most interested in. The arrows show some of the possible transformations.
Our pattern names include: parent-child, V, level, bridge, relation, diamond, triangle, double-V and Y shapes, double and triple Y shapes, tramlines, X shapes and recursive shapes. The Chapters on <Model transformations> shows the transformations in the first row apply to both data models and process models. Other Chapters (not yet collected into a volume) detail the shapes and transformations indicated by arrows on the diagram above.
Many people teach the mechanics of how to document system specifications. Plenty of CASE tools help you with these mechanics; they ensure you get the syntax right; they constrain you to use the proper boxes, symbols and lines. But there are no tools that help you with the semantics - the difficult part - the thinking - the analysis
The skill of professional analysts lies in recognising patterns in specifications, especially in object and event models. They use standard shapes constructively to build up a large and complex picture. They also use them destructively, to analyse an existing specification, to take it to pieces and question whether a better construction should be put upon these pieces.
The patterns catalogue above is a chart of simple shapes with memorable names. I have listed the questions you should ask about each shape, and the possible transformations that you might need to apply. So I can now teach students:
· the name, meaning and use of a pattern in constructing a specification
· the analysis questions which discovery of the pattern prompts
· the design or redesign work that is necessary depending on the answers.
This new approach means that for the first time I can envisage a CASE tool that helps us with the thinking part of analysis and specification. It will help us to build better quality systems, not just better documented systems.
A pattern on its own isn’t much help. What to do with the pattern that has been recognised? This is the expert knowledge I want to capture. A tool can highlight or report on patterns, and prompt its user to answer specific quality assurance questions.
If a CASE tool is to ask us questions about patterns, it must first have the appropriate pattern recognition functions. To recognise the named patterns, the entity at one end of each relationship must be declared as the ‘parent’, and the other must be the ‘child’. E.g. given a one-to-many relationship I always nominate the ‘one’ end to be the parent. It is the parent-child hierarchy inherent in each relationship that makes the shapes recognisable, whether by a person or by a tool.
For people, always drawing the parent above the child imposes a hierarchical structure that helps us to display the known patterns in an easily recognisable form, corresponding to the shapes in the analysis patterns catalogue.
“My preference is to draw the model this way up. Dave Hay prefers the ‘dead crow’ notation - really a matter of taste.” Michael Zimmer
Of course, patterns are careless of the diagram symbols or the presentation form. As long as each relationship has parent and child ends, a CASE tool can detect a pattern if the model is drawn upside-down, or using different symbols, or written down in the form of text or code.
Mike Burrows has developed a CASE tool called Validator (see <www.asplake.demon.co.uk) that detects and reports on most of the structural patterns I have catalogued. It asks you analysis questions, and suggests some transformations that may improve your model. See the Chapters on <Model transformations> for further details.
I earlier applied various transformations to a relational database
design from Halpin. Some of the same transformations appear in a classification
developed by Petia Wohed (neé Assenova) and Paul Johannesson at
Petia and
Paul set out with the intention of making schemas more graphical, to make the
rules fully explicit for the purpose of schema integration. Schema integration
is a different job from schema design, so I will add comments and guidance from
the view point of somebody who designs schemas for enterprise applications.
Also, their modelling language is different from ours. Notable
differences are listed below:
Their
term |
Our
term |
attribute |
attribute or relationship (see below) |
single-value attribute |
attribute or 1:1 relationship |
multi-value attribute |
parent-child relationship (from one
master object to many child objects) |
partial or total |
optional or mandatory |
total in union |
at least one must exist |
surjective attribute |
parent-child relationship with at
least one child |
Given an entity with an optional group of attributes, you may move the
optional attribute group into a subclass where it is mandatory.
Petia and Paul discuss thus under ‘transforming partial attributes’.
Figure 5a illustrates their example.
Fig. 5a
Figure 5b shows an example from chapter 4. The entity Paper has
attributes that only apply to papers accepted for presentation. So you may move
the optional attributes (Total Pages, Total Figures, Total Tables) into a
subclass where they are all mandatory.
Fig. 5b
Chapter 6 suggests that people normally do the reverse in practical
system design. They roll up optional data groups into an aggregate entity,
partly to reduce the length and complexity of access paths by events and
enquiries, and partly for other reasons explored in Later chapters.
Given a parent-child relationship that is optional from the master’s
view point, you can make it mandatory by replacing the child by a subclass of
itself.
P&P call this ‘transforming non-surjective attributes’. An attribute
is ‘surjective’ when each instance of its range (the master entity) is
associated with at least one instance of its domain (the child entity). So a
surjective attribute is an attribute whose inverse is a mandatory relationship.
Figure 5c illustrates their example.
Fig. 5c
After the transformation, each object of the parent entity is associated
with at least one object of the child entity. In this example, the relationship
starts optional at both ends and becomes mandatory at both ends.
Again, people often do the reverse in practical system design. If a
superclass has only one subclass, they would roll the data of the subclass up
into the superclass, partly to reduce the length and complexity of access paths
by events and enquiries, and partly for other reasons explored in Later
chapters.
Given optional attributes that are mutually exclusive, so at least one
must exist, you can introduce a generalised attribute, a superclass of the
mutually exclusive attributes.
P&P call this ‘transforming partial attributes which are total in
union’. Figure 5d illustrates their example.
Fig. 5d
Figure 5e shows a different convention in database design - to turn the
mutually exclusive attributes into mutually exclusive relationships.
Fig. 5e
Later chapters explores the difference between a ‘class hierarchy’ as in
figure 5d and an ‘aggregate’ as in figure 5e.
Given several parent-child relationships that are optional from the
parent’s view point, but where at least one must exist, you can make them
mandatory by introducing a superclass of the various children.
P&P call this ‘transforming non-surjective attributes which are
total in union’. Figure 5f illustrates their example, where a Head Teacher is
obliged to take responsibility for at least one course.
Fig. 5f
This transformation is unusual in practical enterprise application
development. The requirement that a parent must have at least one child drawn
from different types is not very common.
Designers are
likely to apply the reverse transformation, that is, relax an ‘at least one’
constraint after it has been defined, because where a business monitors
hundreds or thousands of objects, it is normally easy come up with counter
examples, valid exceptions to the rule.
P&P call this ‘transforming m-m attributes’. Figure 5g illustrates
their example.
Fig. 5g
This
transformation is very common in practical enterprise application development.
So common that it is second nature to database system developers, not just
because it is required for implementation reasons, but because resolving
many-to-many relationships is a valuable step in analysis. See chapter 5 for
further discussion.
Given an entity with non-key attributes, you can raise any attribute
other than the primary key to become a parent entity connected by a 1:N
relationship.
P&P call this ‘transforming lexical attributes’. Figure 5h
illustrates their example.
Fig. 5h
This is very common in practical enterprise application development. But
why and when to do this?
‘the schema is more stable, since it is easier to add extra attributes for cities. Some of the queries after the transformation become more complex, because the derivations cover a larger network of objects.’ P&P
Let us focus on a tiny part of the model at the end of chapter 4. Figure
5i shows the non-key attributes of the Room entity raised to become key-only
parent entities.
Fig. 5i
The best kind
of analysis pattern prompts ‘Ask of this pattern…’
questions.
Ask
of a non-key attribute: Is the range of values constrained? If yes, define the
attribute as a parent entity.
E.g. Room Type is constrained (lab,lec,office) and Building is
constrained (1...5). You can prevent mistaken classification of a Room under an
invalid entity by defining these values as objects of a parent entity.
If no, nobody wants to control the range of values, then don’t make it a
parent entity. E.g. Say nobody cares too much about what is recorded as a Area.
The Area entities are derivable from whatever values happen to be recorded.
Ask
of a non-key attribute: Do users control the range of values? If yes, define
the parent entity in the Business services layer.
E.g. Users might want to control the range of Room Types
(lab,lec,office).
If no, or you want to stop users from change the system’s rules by
adding or deleting objects of the class, then define the parent entity in a
layer of the design controlled by designers. E.g. you might define Building as
a class in the UI layer or a table in the data storage structure.
Ask
of a key-only parent entity: Does it have non-key attributes of its own? If
yes, make it an entity like any other in the Business services layer.
E.g. you might record the total number of Rooms as an attribute of the
Room Type entity. Even a derivable total like this turns the key-only parent
entity into an entity like any other.
If no, then you may later treat the key-only parent entity differently
from other entities in the data storage structure, perhaps define it as an index
rather than a table.
Ask
of any remaining non-key attributes: Do users regularly make enquiries that
select or classify objects by a single value of the attribute? You may signify any further
requirement for classification by drawing an entry-point arrow on the entity,
showing which attribute is used for selection. Again, the attribute may become
some kind of index in the data storage structure, rather than a table.
Given an entity with an attribute that has a small fixed range of
values, you may transform the fixed range into distinct sub entities.
P&P call this ‘transforming attributes with fixed ranges’. Figure 5j
illustrates their example.
Fig. 5j
‘It is easy to see the subclasses in the graphical notation. In contrast, it is more difficult to read and understand the definition of the lexical type Employee Type, which is not included in the graphical notation of the schema.’ P&P
Given the attribute Room Type (lab, lecture room, office) in the case
study in chapter 3, you may transform the fixed range into distinct subclasses.
Fig. 5k
This transformation is rare in practical enterprise application
development, for reasons explored in Later chapters.
Given a class hierarchy in which subclasses share properties in an
orthogonal dimension, you can create a class network.
P&P call this ‘transforming to lattice structures’. Figure 5l
illustrates their example.
Fig. 5l
‘It is easier to see that Scholar is specialisation of both Teacher and Researcher if they are drawn as boxes, compared to searching for and reading a rule that has the same meaning.’ P&P
Later chapters explores this transformation in more child, but a little
of the discussion is repeated below.
Figure 5m shows that a data structure in which Class Teacher and Head
Teacher inherit from Teacher might be extended to include a subclass that
inherits from both Class Teacher and Head Teacher.
Fig. 5m
Figure 5m shows a diamond-shaped is-a tree in which a Dual Role Teacher
entity has been introduced to accommodate the few Teachers that are both Head
Teacher and Class Teacher
· A Dual Role Teacher is a Head Teacher is a Teacher.
· A Dual Role Teacher is a Class Teacher is a Teacher.
Defining a diamond-shaped is-a tree may be a recognised practice in
object-oriented languages that support multiple inheritance, but one should be aware
that the meaning of the model is ambiguous, in the way described below.
The model does not specify the rule that a Dual Role Teacher is a single
Teacher. It might equally well be read to imply that two Teacher objects are
needed instantiate one Dual Role Teacher object.
One way or another, an object-oriented programming environment that
allows multiple inheritance must work out that Dual Role Teacher inherits only
once from Teacher. But the semantics of the diagram notation don’t tell you
this, and we want the conceptual model to act as a specification for relational
database programmers as well as object-oriented programmers.
Diamond shaped structures are discussed further in Part Two.
Where an
entity has a list of similar attributes, you can generalise these attributes
into a relationship. P&P call this ‘transforming non-unary attributes’.
Figure 5n illustrates their example.
Fig. 5n
Once again, this transformation is not very common in practical system
design. Figure 5o shows two more common transformations discussed in Later
chapters.
Fig. 5o
Schema
integration v. schema design
Petia has commented as follows.
‘All of our transformations increase the size of the schema (the diagram that is). Some don’t like this. But it is the price you pay for a clear and explicit model of rules and constraints.
‘People can specify constraints as rules attached to the attributes, hidden away in a data dictionary behind the model. But then it is harder to see concepts which may be important.
‘The idea of conceptual modelling is to model the universe of discourse, capture its important aspects, in a graphical picture. So, our nine transformations make the presentation more explicit, more visible. The aim is to specify rules and constraints as relationship lines in a graphical model.’
Petia and Paul are interested in these transformations for the purpose
of schema integration. Making things visible makes the process of schema
integration easier. If you plan to merge two schemas, you do need to make all
the current rules fully explicit.
But note that schema integration is a one-off exercise. You can be
confident that the range of a type, the instances of a class, the rules of the
business, will not change while you are working.
Building a conceptual entity model for long term use is a different
matter. The model has to hold object data for years. It has to survive while
objects are created, amended and destroyed, while the ranges of apparently
fixed values are altered, while the rules evolve.
This gives the modeller a different perspective. The modeller will try
to avoid fixing temporary rules (like a range of subclasses) into the data
structure. I tend to avoid creating class hierarchies for this and the other
reasons explored in Later chapters.
The important thing is to record the semantics of the problem domain, one
way of another. Different diagram drawing conventions lead you to draw
different-looking conceptual models.
Some people like to represent every term and every fact in a box of its
own. You might specify each fact about an object by drawing a rectangle. You
might place a rectangle on each and every line between one named term and
another named term. Figure 5p shows the kind of diagram that results from this.
Fig. 5p
There is no law saying you have to represent every attribute or
relationship in a rectangle. Doing this usually creates a diagram that is far
too large for practical use.
When you are building an enterprise application with perhaps 2,000 data
items; you cannot handle a picture that shows every data item in a box (let
alone every value of every data item as some of the transformations in this
chapter lead to).
It is more convenient to roll one entity up to become an attribute of
the other. You may do this where there is a 1:1 relationship, or where one
entity is a key-only entity, with no attributes of its own.
Figure 5q features both 1:1 relationships and a key-only entity. It can
be condensed by rolling the ‘key-only entity’ into the ‘state entity’.
Fig. 5q
Figures 5q and 5r shows that whether a business term becomes an entity or an attribute depends on the perspective of the system’s users.
Fig. 5r
Colour might seem indisputably to be an attribute. But Colour might be easily
be an important business entity in a company that manufactures paint.
By the way, figure 5s shows the term
‘state entity’ comes from one way to classify different kinds of entity in an
entity relationship model.
Constraint object Value that constrains business data |
Universal value
object Defined outside the business (colour, month) |
|
State object created and destroyed by the business (customer, application) |
Business object Value that currently applies to objects in the business |
|
|
Derived object derived from values stored in other objects, not a constraint on them (month of birth) |
Fig.
5s
How relationships prompt the analyst to ask questions.
All the commonly used notations show entities as boxes and relationships
as lines between them. I use a diagram notation based on that developed (I
think) by Charles Bachman in the 1960s, from which a number of other variants
have been derived. It doesn’t matter if you prefer another notation (say, after
Chen, or OMT) that expresses the same semantics.
To show the dependence of one object on another, or its independence of
other objects, our notation uses a continuous line or a broken line:
symbol |
shows that an
object at that end of the relationship |
broken line |
can exist without the relationship |
continuous line |
cannot exist without the relationship |
Fig. 6a
Fig. 6b
A solid continuous line is a mandatory 1:1 relationship. The objects at
either end share the same identity, even though they might be given different
keys by a business.
Fig. 6c
Later chapters shows you may draw a mandatory 1:1 relationship to
connect the parallel aspects of an aggregate. But as a rule of thumb, you
should assume that nature abhors a symmetrical or non-hierarchical
relationship.
Ask of a mandatory 1:1 relationship: Are
the two objects created and destroyed at the same time? If yes, then the two objects share the same identity. They are what
I call an aggregate. For questions about aggregates, see Later chapters.
Ask of a mandatory 1:1 relationship:
Can an object of one entity exist without an object of the other entity?
In this case, you may discover that a School can exist without a Head
Teacher, but not vice-versa.
Fig. 6d
In a
semi-optional 1:1 relationship, the independent entity is called the parent entity of the relationship. The
dependent entity is called the child
entity of the relationship.
The parent-child nature of relationships helps us to draw an entity
model in a structured way, with parents towards the top and children towards
the bottom.
Fig. 6e
This hierarchical structuring gives us opportunities for naming standard
shapes, recognising them in specification diagrams, using them to ask
questions, and teaching the analysis and design implications.
Ask of a semi-optional 1:1 relationship:
Might there be more child objects than parent objects? If
no, the model is incorrect, since the population of objects contradicts the
cardinality specified by the relationship.
Ask of a semi-optional 1:1
relationship: Does it describe an aggregate or a class hierarchy?
Figure 6f
shows you can test the meaning by trying to write either ‘belongs to’ or ‘is a’
on the child or subclass end of the relationship, and ‘may have’ or may be’ at
the top.
Fig. 6f
What I want is a graphical notation that combines the cardinality rules
specified by a database structure notation, with the semantics specified by
object-oriented notation. Figure 6g shows a notation you can use to express the
different semantics, while retaining the cardinality information.
Fig. 6g
Figure 6h shows a deep is-a tree.
Fig. 6h
Figure 6i shows notations you can use to show aggregates and is-a trees
with several overlapping children or subclasses. The fact that the lines are
dotted at the top means the children or subclasses may not apply.
Fig. 6i
Figure 6j shows notations you can use to show that the children or
subclasses of an aggregate or class hierarchy are mutually exclusive.
Fig. 6j
Both these diagrams say ‘either one case or the other case’. If you
wanted to allow ‘neither case’ as well, then you would draw the top half of the
relationships with a dotted line
In this short section I have entered the territory of object-oriented
modelling; see later chapters for much more discussion of aggregates and is-a
relationships.
Ask
of a semi-optional 1:1 relationship: Can a parent object be related to many
child objects over time? And do we want to record past children? If so, then
the relationship becomes 1:N, as shown later. You should add historical
entities and relationships to the model wherever they are needed in order to
support users’ requirements for information.
It is usually easiest to start by drawing the entity model without
history. The model above will record for a School only the currently employed
Head Teacher; it won’t keep a history of past Head Teachers. Let us say we are
not interested in this history.
Ask of a semi-optional 1:1 relationship: Is the child object a
singular optional attribute of the parent object? If
yes, you might make the optional attribute mandatory and roll it up. (See the
questions about aggregates in Later chapters.)
However, if the
child object is a group
of attributes in 1:1 correspondence (so if one is present then all are) then
the semi-optional 1:1 relationship saves you from specifying the rule of 1:1
correspondence between attributes of the group within a larger entity.
Ask of a semi-optional 1:1 relationship: Is the child object merely
a later stage in the life of the parent object?
If so, then you normally roll up the child entity into the parent entity, with
the same benefit/cost tradeoff as above.
Fig. 6k
An object in this kind of model becomes divided between entities as it
progresses through its life. Somebody who starts as a Pupil may later become a
Senior Pupil as well. This is an unnecessary elaboration, leading to some
redundant design and coding effort. Where the child entity represents merely a
later stage in the life of the parent entity, you may roll the two entities
into one.
The convention I favour is that objects don’t normally change class.
Some people propose the reverse, that you should create a subclass for
each state an object may pass through, showing them as mutually exclusive
subclasses in a class hierarchy. But this adds to the design and coding effort.
It increases the number of entities in the design and the complexity of coordinating
separate objects during an enquiry or update process. If each entity becomes a
database table, it slows down performance, since more objects must be retrieved
and stored.
Later chapters say a great deal more about types and states.
Returning to the first example and first question, you should ask about
the objects at both ends of the relationship: Can they exist without it? If
yes, you should define the relationship as being optional at one or both ends.
In this case, you may discover that the system has to record both
headless Schools and unemployed Head Teachers.
Fig. 6l
Again, nature abhors a symmetrical or non-hierarchical relationship.
There are two ways to introduce a third entity into the picture.
Fig. 6m
Ask
of a wholly optional 1:1 relationship: Does the relationship hide data
describing the reason why the objects are linked? If yes, you should create a
link entity.
E.g. you may discover
that a School and a Head Teacher only become linked via a Contract. You can
redraw the entity model in a hierarchical V-shaped structure.
Fig. 6n
The link entity at the bottom of a V shape acts to constrain the
relationship between the two higher entities. It gives the relationship a
meaning. It restricts the possible links between objects of the two higher
entities; you can only connect objects which are in reality connected by this
meaningful relationship.
Remember: the identifier or key of an entity state record is not just a
database concept, it is a necessary business concept. It enables you to:
a) distinguish that object from another of the same class and
b) map the entity state record onto a real-world object in the business
environment.
Later chapters includes analysis questions that are relevant to a 1:1
link entity.
The 1:1
bridge shape gives two child entities a common parent. Use it where entities
are additive roles rather than mutually exclusive subclasses.
For example, suppose that you wish to combine two legacy systems, one
from Europe and one from the US, that maintain information about an overlapping
range of stock types. The two systems identify their range of stocks by different
numbering systems. Some European stock types are the same as those in the
Fig. 6o
Ask
of a wholly optional 1:1 relationship: Is there an aggregate of which the two entities
are partial realisations?
In this case you might create an entity that sits over and between the
two systems.
Fig. 6p
The entity model above says ‘either, both or neither’. It says you can
instantiate a superobject that has no related subobject. Specifying an ‘either
or both’ constraint to exclude ‘neither’ is beyond us here.
Both the 1:1 V shape and the 1:1 bridge shape are aggregates, and they
prompt analysis questions. Aggregates and semi-optional relationships often
transform in one of the ways described in Later chapters
The Bridge shape is akin to one of Gamma et al. patterns called Adapter that is designed to ‘convert the interface of a class into another interface clients expect. Adapter lets classes work together that couldn’t otherwise because of incompatible interfaces.’
A typical object-oriented system records only the current state of transient
objects. A typical enterprise application records historical data about
long-lived real-world objects. Both the real-world objects and the entity state
records are persistent. History and persistence make for 1:N relationships.
Ask
of any kind of 1:1 relationship: Can an object of one entity relate to more
than one object of the other entity over time? And do we want to record past
children?
If yes, you should show the manyness of one or both ends of the relationship.
The result of asking this question is that the majority of associative
relationships in enterprise applications turn out to be 1:N, shown in our
notation using a fork:
symbol |
shows that at that
end of the relationship |
fork on line |
there may be several objects |
no fork on line |
there may be no more than one object. |
Combining the continuous or broken line with the fork, the notation can
show four kinds of 1:N relationship, as illustrated below.
Fig. 6q
In a 1:N relationship, I call the entity at the ‘one’ end the ‘parent’
entity of the relationship; and the entity at the ‘many’ end the ’child’ entity
of the relationship.
Ask of a 1:N relationship: Can a child object exist without a parent
object? If no, a child must be owned by a parent, then the
relationship line is continuous at the child end.
For example:
Fig. 6r
It is often helpful to name a relationship at both ends, as I have done
here.
Ask
of a 1:N relationship: Can a parent object exist without a child object?
The relationship that exists between Teacher and Pupil is optional at
both ends. Not all Teachers manage a class. Not all Pupils are assigned to a
class with a class teacher, only the younger ones. So the relationship line is
broken at both ends.
Fig. 6s
You may
wonder about introducing School Class into the model. Thankfully, the concept
is not recorded in our system, otherwise I would have to worry about confusing
‘Class’ with ‘class’ in our discussion here.
There are at least three more questions you should ask about any 1:N relationship.
· Can a parent object have more than one active child object at once?
· Does a parent object retain historic children as well as active children?
· Can a child swap from one parent to another?
It might be possible to extend the notation to show all the answers in a
graphical form. But this way lies madness. If you try to show all constraints
on an entity model, you end up with a picture that is so large, so rich in
semantics, and so complex in appearance that you cannot use it.
It is better to ask these questions during Event Modelling and object
behaviour analysis, and document the answers there, in the diagrams for each
persistent entity class and each transient event class.
Of course, you may revise or extend the entity model with new entities
or relationships after you have answered the questions.
You may at first include N:N relationships an entity model of business
objects.
Fig. 6t
Again, nature abhors a symmetrical or non-hierarchical relationship. You
may draw explicit N:N relationships in the early stages of a model, but you
should always resolve them before completing the specification.
Ask
of a N:N relationship: Does it hide a concrete real-world entity? If
yes, you should create a link entity.
E.g. you may
conclude that the relationship is established via a Pupil.
Fig. 6u
A Pupil is a very concrete entity, but a weak way associate a School
with a Teacher. Not every Teacher is a class Teacher. Not every Pupil is
assigned to a class Teacher.
Ask
of a N:N relationship: Does it hide important data describing the reason why
the objects are linked? If yes, you should create a link entity.
Asking about this case, you might discover the Employment Contract. You
can reveal the hidden data by simplifying the N:N relationship into two or more
1:N relationships.
Fig. 6v
I will return to discuss how to use the V shape to constrain a system’s
behaviour, and some design issues raised by it.
We’ve looked mainly at questions about single relationships. Part Two
discusses some of the ways in which relationships may form larger and perhaps
more interesting shapes.
This chapter reviews traditional data analysis techniques.
As Winston Churchill said in a very different context: ‘It may be
unfashionable, it may be unpopular, it may be unpalatable, but its the truth.’
Well, it is part of the truth. I add a few analysis questions to be asked
during data analysis.
Do not confuse a database view in the UI layer with the data structure of the
underlying business. You must decompose aggregate objects displayed in the UI
layer for processing inside the system.
Business
rules belong to the entities in the underlying application, not the aggregate
objects in the UI layer (though these might unfortunately be called ‘business
objects’).
Relational data analysis is a good way to reduce the aggregates of data
items found on forms, screens or data files, into a set of simple normalised
relations. Allow us to equate the concepts of entity and normalised relation
for the time being.
Normalisation, a technique used in data analysis, is a fine example of
generative patterns, of model transformation by question and answer. It reduces
complex data structures to the simple building blocks from which they are made.
It reduces unnormalised data in stages through successive normal forms.
The starting point for the example below is the data to be found on a
batch of Sale Returns emailed to head office by a salesman.
UNORMALISED |
1ST |
2ND |
3RD |
Separate the entity from the repeating group |
Salesman |
Salesman |
Salesman |
Salesman name |
Salesman name |
Salesman name |
|
|
|
|
|
Salesman name |
Salesman name * |
Salesman name * |
Salesman name * |
|
|
|
|
Product name |
Product name |
Product name * |
Product name * |
Product price |
Product price |
--- |
--- |
Item quantity |
Item quantity |
Item quantity |
Item quantity |
Cust Num |
Cust Num |
Cust Num |
Cust Num * |
Cust name |
Cust name |
Cust name |
--- |
|
Remove attribute who
value depends on part of the key |
Product |
Product |
Product name |
Product name |
||
Product price |
Product price |
||
Remove attribute who
value depends on part of the key |
Customer |
||
Cust num |
|||
Cust name |
Fig. 6a
Some object-oriented designers dismiss normalisation, because it is entirely data-oriented, but there are many things to be said in favour of it.
For one, it encourages you to think in detail about the users’
requirements for information. Several of the case studies published to
illustrate object-oriented methods appear to be complex and challenging –
finding the right classes is relatively difficult or mysterious. My study of
the case studies suggests thet would be easier if the authors had defined some
input and output messages at the start, then (dare I say it) applied a little relational
data analysis to those inputs and outputs!
For another, the normalisation process depends on the analyst choosing
an identifier or key for a data group. The key is underlined in our examples.
When a data group is in third normal form, each attribute is ‘determined by’ or
‘dependent on’ the key. Given the value of an object’s key there is only one
possible value for any given attribute of that object.
Why is thinking about keys helpful?
Choosing a key may seem merely an implementation decision. Indeed, you might not decide between various possible candidate keys for an entity state record until relatively late in the design process.
But the intention or desire to give an entity state record a key
is not just a database concept, it is a business concept. When you choose a key
during relational data analysis you are making a statement about the business
perspective you are taking of the real world.
Users need a key that will enable them not only to:
• distinguish one object from another of the same class, but also to
• map the entity state record onto a real-world entity in the business
environment.
One reason for taking required output reports, or a legacy database, as
the source documents for data analysis is that these sources will reveal the
things the users already care about enough to have awarded keys.
The key must uniquely identify an object, and not have more than one
value for it. In other words, the values of the key must be in 1:1
correspondence with objects of the class.
You may have to choose between several candidate keys. Since users need
keys that help them map entity state records onto real-world entities, you
should favour natural attributes over artificial identifiers.
If you have
to make up a key from a long list of attributes, then so be it. The important
thing as far as data analysis is concerned is that you have established the business need for a key.
In the old days, designers might have said to choose numbers over text,
short items rather than long ones, and few items rather than many, but the
ability of users of a graphical user interface to select objects from lists now
saves people from having to type long multi-item keys.
Students normally learn normalisation by completing a table such as the one drawn above. This is a bit like practising scales when you start to play the piano.
Few if any professionals do data analysis the way students are taught
to, just as a concert pianist almost never plays a scale during a stage
performance.
Professionals use data analysis to place facts into an existing entity
model, albeit an informal or provisional entity model. They:
· reconcile input and output data flows with an existing entity model
· refine an informally defined entity model
· reverse engineer entities out of an existing database schema
Data analysis is a technique for both forward and reverse-engineering.
Nowadays data analysis is a common way to start reengineering a legacy system,
it helps you take advantage of the effort that has already gone into to
defining the legacy database.
The analogy between analyst and concert pianist is a poor one, because
it is possible to teach novice analysts to do data analysis the way the
professionals do it. The trick is to focus on the way normalisation reshapes an
entity model, on graphical model transformations.
The analyst starts normalisation by choosing an identifier or key for
the entire unnormalised data group. This is not always easy. The key should
uniquely identify at least one other data item. So the choice of key in figure
6b is a poor one.
Choosing a poor key for unnormalised data has little effect on the
entities defined at the end of the analysis, but it dictates the path that
normalisation takes, and it leaves you with a key-only entity. This key-only
entity may turn out to be redundant, not interesting to the business.
Following the rule that a fork grabs an asterisk - a relationship grabs a foreign key - you can draw the data groups that result from data analysis as entities connected by relationships. So let us repeat the data analysis of the example by following generative patterns.
Figure 6b shows the standard pattern for the first normalisation step is
to drop out a child entity.
Notice the instruction to choose a key for each new child entity. We’ll
come back to talk about this in a moment. Consider the example in figure 6b.
If you had chosen Product Name as the key for the unnormalised data in a
Sale Return, then Salesman would not end up as an entity on its own. But having
chosen Salesman Name as the key, every other data item is immediately removed
as repeating data, data that has several values for the key, so you end up with
Salesman Name as a key-only entity.
In other cases, you might choose to drop the key-only entity. But in
this case, the Salesman is probably an important entity and worthy of record.
You may well discover an attribute for Salesman during data analysis of another
form, screen, report or file.
A choice between candidate keys often arises when choosing a key for an entity revealed at first normal form. Typically the revealed entity is a link or bridge between two or more entities with simple keys of their own.
Ask
of a link entity: What uniquely identifies an object of the link entity?
Consider the
choice of key for a
Fig. 6c
But this is not user-friendly. If the users do not already use
Fig. 6f
The trouble with a compound key is that objects of the parent entities
can only be linked once, by one link object. If you want to allow duplicates,
you have to extend the compound with date and time attributes, or with some
other qualifying element or sequence number.
Fig. 6g
The Sale
Number in these examples is used to extend the range of values provided by
combining the keys of parent entities. All of these hierarchic keys for a
There is another difficulty with the way data analysis is taught. Listing data items in an unnormalised data group means you lose sight of the data structure you started with.
Given a complicated document or file, you may need to divide it at first
normal form into a complex structure of parallel and nested repeating groups.
It is impossible to visualise this structure by looking at a list of
unnormalised data items. It is much easier if you record the repeating data
groups as distinct entities from the outset, or draw boxes around data groups
on the original document.
The standard pattern for the second normalisation step is to raise a
parent entity with a key that is part of the key of the child.
In general, given any multi-item key it is well worth asking about the
classes that might exist identified with one part of the key as their own key.
See Part Two.
The standard pattern for the third normalisation step is to raise a parent entity and assign the determining attribute(s) as its key, as shown below.
Fourth and fifth normal forms are discussed in Part Two. Boyce-Codd normal form is a variation of third normal form that eliminates possible anomalies where there are several candidate keys which share a common attribute, a complication that need not concern us here.
You will normally merge the end results of various data analysis
exercises. You can merge any classes whose objects are in 1:1 correspondence;
this is usually indicated by their having the same key.
Two of the classes in our little case study pick up an extra attribute
from data analysis of other documents.
Fig. 6k
Notice that Cust Address has sneaked into the
After data
groups have been merged, items will have been brought together in new
combinations, so you should apply two tests to ensure that the resulting classes
are in third normal form (TNF).
Ask of a class: the first TNF test: Is
there only one value for each data item in the class, given a single value for
the key?
If no, there has been a mistake and a first or second normal form reduction to be
investigated.
Ask of a class: the second TNF test:
Is the value of an item determined by a non-key item, rather than the key? If
yes, there is a third normal form reduction to be investigated.
For example, is Cust Address really determined by Cust Num, and best
moved into the Customer class? Or perhaps the address is recorded afresh on
each
Data analysis
is never as easy as presented on training course, because you have to ask
business analysis questions about how data changes over time, and whether
historic data has to be remembered.
A relation is an aggregate of several attributes. Chapter 4 showed you
might define each attribute as a key-only parent entity in its own right. What
I call the ‘relation shape’ is a class with three or more parents.
Fig. 6l
For example, let us say a Product is a type of Ingredient with a unique
combination of four other characteristics, each of them user-defined. You can define
each attribute as a class in its own right, as shown below in a fraction of the
full model:
Fig. 6m
Users will define the valid range of each attribute by creating objects
of the parent entities. Suppose it turns out that users start to record
products in the database that cannot actually exist in practice, products with
invalid combinations of size and ingredient. Four solutions might be designed.
Management
solution
A weak solution is to place some kind of security constraint, using a
password perhaps, on who is allowed to set up products in the system.
UI
layer solution
A weak solution is to code the rules as constraints on data items where
they are entered into the system, in the user interface code. This may prove difficult
to maintain as the code is added to several datan entry screens. A stronger
solution is to record the constraints in reusable modules underlying the user
interface.
Data
services layer solution
A strong solution: specify validation rules applying to data items in a
data dictionary attached to the database management system. Unfortunately, few
database management systems come with a sufficiently clever data dictionary,
one that can apply the rules dynamically to a live system. If you do have a clever
enough data dictionary, then think of it as belonging in the Business services
layer rather than the Data services layer.
Business
services layer solution
A strong and practical solution: record the validation constraints as a
cross-reference table, or link class, in the entity model of the Business
services layer.
Fig. 6n
The
introduction of a V shape domain above the State Class currently looks like the
best design option for most applications.
Ask of a relation: Will users control the valid combinations of different attributes? If yes, then create a V shape domain class.
There are weakness in the relational view of the world.
Data-centred,
database semantics are not handled
People have attempted to extend the relational model to accommodate
business rules. These approaches are sometimes called ‘semantic entity
modelling’. The difficulty is that these approaches tend to be so heavily
data-centred that you have to think in rather abstract and difficult ways to
discover and define the processing logic and processing rules.
In our view,
object interaction and object behaviour analysis is a ‘semantic entity modelling’
approach, though you specify the rules on event models rather than the entity
model itself. The data and process models are all part of one coherent
conceptual model.
Objects
are key-oriented rather than type-oriented
Relational
theory does not account for mutually exclusive or optional data. I will show
how object interaction and behaviour analysis deals with class hierarchies of
super and subclasses.
Parents
don’t know where their children are
Relational theory suggests that a parent entity should not know about
its children. There are no tables or lists, only foreign keys. This minimises
data redundancy, but access from parent to child involves a great deal of
processing redundancy. This is a big factor in slowing down system performance.
Where a database is distributed it is almost inconceivable that a parent entity
should not somehow know where its children are.
This is an issue for the Data services layer and you should not even
have to think about when defining the Business services layer!
How access from parent to child is achieved is a matter for the Data
services layer. The database designer may implement a relationship in either
relational or network style. In network databases, tables and lists are
allowed, especially for storing relationships. Thus, while data redundancy is thus
permitted, access from parent to child involves no redundant processing.
Aggregate
objects cannot be stored
You cannot store objects without repeating data in a relational
database, because relations must be in first normal form. I will show this is
fine and correct for the Business services layer. The Business services layer
must separate out the low-level normalised classes, partly for update
efficiency and partly so that they can be viewed from many different
perspectives.
You may
however choose to design and process aggregate objects in the presentation and
Data services layers. Some people use the term business object to describe an aggregate object in the
UI layer.
Once more, be
careful not to confuse a database view with the database itself. Business
objects in the UI layer must be decomposed for processing in the Business
services layer.
To supplement data analysis and overcome the above weaknesses, object
interaction and behaviour analysis techniques are needed.
There are two main techniques that complement entity modelling. Both help to validate and improve the entity model.
Chapter 7 introduces event modelling techniques that can be used in the Event
Modelling face of the cube. The volume “The Event Modeler” goes into much more
detail.
You can specify static and invariant constraints (applied in every case and unchanging) as properties of data. You can specify validation rules governing the ‘domain’ of a data item, and you can fix referential integrity rules by specifying the optionality and cardinality of relationships.
But some constraints are dynamic or changeable, so it is not appropriate
to build them routinely into implementation database structures, or even a
logical entity model.
E.g. English law lays down a number of constraints governing a wedding
event: a marriage must relate two partners, no more, no less; one partner (the
husband) is male; one partner (the wife) is female; both partners must be over
18 years of age; a person can have only one marriage at a time; a person can
only have marriages in their sex of birth. And there are further preconditions
to do with the notice period, the number of witnesses, the residential
addresses of the partners, the location of the marriage, and so on.
You need ways to make all constraints explicit, not just referential
integrity rules. In general, constraints are assertions about the actions that
are possible. You prevent a data item from being entered, or a relationship
from being established, by preventing an event from taking place. So you can
specify all remaining constraints as preconditions on events.
Chapter 7 includes an illustration.
An overview of Event Modelling techniques that specify how events hit
objects in the entity model.
Level of granularity
>>
Enquiry identification
>>
Event identification
>>
Validation
of the entity model during analysis
You should validate the classes and relationships by testing that they
support all the enquiries that users say they want to make of the required
system. In simple cases, programmers can do this by defining an SQL query. At
an early stage of analysis, especially for complex cases, it helps to define
the enquiry model for the enquiry requirement.
To verify that atomic enquiries within the system functions can get the
information they need by accessing entities - to test that every known output
data flow (message, report or file) can be derived using the relationships in
the Entity model - you can draw every enquiry access path as an enquiry model.
This means defining the entry point object (identified by the input data
parameters) and the navigation path along relationships to collect the required
output information from other objects.
You can redraw the enquiry model from the perspective of the specific
enquiry.
Notice that an enquiry process that follows this particular access path
will find the same Customer many times.
An enquiry may perform redundant processing, retrieve more entities than are necessary for the required output data flow. If the enquiry is infrequently made, you may assume the output data flow will be sorted and duplicates removed.
However, if the enquiry is a primary system function, triggered many
times a day, you may perhaps prefer to refine the Entity model so that no
redundant accesses are made, by adding a derivable entity into the Entity
model.
The new entity is not entirely a matter of performance optimisation. The
fact that users enquire about ‘Customer Interest in Product’ so often shows
that this associative entity (derivable though it may be) is a matter of
concern to users in running their business.
If you do include such an entity, make sure the text description of the
entity starts with the word DERIVABLE, and name the requirements that is used
for. Designers may choose either to store the entity as a database table, or
write relatively complex enquiry processes.
If there is only one route through the Entity model from the entry point, then the access path is obvious from the enquiry model. Otherwise, you have to specify which of several relationships are followed. You can draw arrows to show this.
Or you can draw the enquiry model from the perspective of the specific
enquiry.
Note that one entity type may appear twice in an enquiry model playing different roles. One convention is the name the entity role in brackets after the entity name.
Don’t forget management and audit reporting requirements. Wherever you find historical facts are needed on output,you should include historical attributes and relationships in the Entity model. Occasionally, you might be led to include an extran entity, and restructure the entity model accordingly.
Triage
in enquiry access path analysis
Only document
those enquiry models that are not obvious from the specification of the output
data flow and Entity model. Under pressure of time, analyse only the outputs of
primary system functions.
An event is like an enquiry except that it updates one or more of the
objects it hits.
>>
Events are more complex than enquiries. Events require more careful
analysis. You can use object behaviour analysis techniques to analyse events
and define the behaviour of each class as a state machine composed of event effects.
>>
The volume ‘Event modelling for enterprise applications’ shows it is not far from an event model diagram to either a procedural program, or to object-oriented programming.
For example, suppose a recruitment consultant wants to discover which
Jobs are available for an Applicant you must find out:
what Skills the
Applicant has, and
what Skill Type each
Skill is classified under.
what Jobs are available
under that Skill Type.
The graphical representation below shows how the relationships provide
the path to select the objects that satisfy the enquiry.
Fig. 7w
A CASE tool can mechanically convert figure 7w can into Figure 7x.
Fig. 7x
A CASE tool can mechanically convert figure 7x can into a Jackson-style
program structure, with read statements allocated at the correct points.
The ability of an entity model to support an enquiry access path
(however it is documented) is very important. The access path tells you which
relationships are needed to select objects. It also enables you, in physical
design, to design a database which has records, representing classes, stored so
as to provide efficient paths for retrieving information.
Our focus is on enterprise applications, but event modelling techniques
are useful for other kinds of system - especially process control or embedded
systems.
Methods for
designing embedded systems normally focus on behaviour and process modelling
techniques, and pay little or no attention to the entity model. But embedded
systems do have an entity model.
The objects
in a process control system may not be numerous or persistent enough to be
stored in a database and connected therein by pointers. However, there is an
entity model behind the scenes, and if you draw it, the model does tell you
something. The relationships specify the paths along which objects may
communicate.
How knowledge of patterns such as double-V shapes and diamonds can give
productivity and quality benefits, helping people not just to draw entity
models, but to get them right.
A class can participate in several relationships, either as parent or
child. Fig. 1a shows a School is both the parent of many Pupils, and a child of
one Local Authority.
The figure also shows three classes that appear in the shape of a
triangle. This shape prompts an analysis question.
Fig. 1a
Ask of a triangle: For a given parent, are
the same child objects discovered down the long direct relationship as down the
two shorter indirect relationships?
If no, then keep the long relationship. You might need it because another relationship is optional at the child end, or because the bottom-level child has two different parents.
If yes, then remove the long relationship, even though it is a true statement.
Why? First, the long relationship is redundant; it says nothing that is
not said without it. Second, it may wrongly permit the end-user to attach the
bottom-level child to two different parents via the direct and indirect routes.
Later sections discuss the question of double parents in triangles and
diamonds.
Fig. 1b shows that adding Teacher into the picture creates a diamond shape.
Fig. 1b
The two sides of the diamonds represent what might be called a ‘boundary
clash’ between two conflicting ways for low-level objects to be grouped into a
batch or collection.
Ask of a V shape: Can there be more than one link entity for one
combination of the two parents? If yes, then
consider transforming the V shape into a Y shape with a derivable sorting class
at its heart.
E.g. Fig. 1c shows the Customer Interest in Stock class clusters all the
Orders for the same combination of Customer and Stock.
Fig. 1c
You may discover a key-only relation as a result of applying relational
data analysis to an output report. For example, a report of Orders listed by
Customer within Stock, or by Stock within Customer, may lead you to specify
Customer Interest in Stock as a key-only relation or sorting class.
You may also discover a derivable sorting class when defining an access
path to create such a report.
Some database designers are obsessed with removing all derivable data
from the entity model, careless of the expense of redundant programming effort
and processing time. The three-tier architecture gives us an opportunity to
reexamine this assumption.
Entity
classes in the Business services layer
If the users’ requirement is for frequent reports that sort Orders by a
combination of Customer and Stock, then the derivable sorting class surely belongs
in the entity model. You can now specify simple enquiry processes that return
the results the users want. You can code these enquiry processes in the
Business services layer on the assumption that the derivable sorting class
exists.
Soft
classes in the Data services layer
What if the derivable sorting class is missing from the data storage
structure? Perhaps the database designer rejects it, or you have inherited a
legacy system without it?
This can complicate the specification or the coding of enquiries that
generate the required reports. Since this complication is caused by the
database designers’ requirements, not by users’ requirements, you should hide
the complication from Business services layer processes, in the Data
abstraction layer to the Data services layer.
The idea is that any enquiry process that wants to read a
Customer-Interest-in-Stock object will call the Data abstraction layer to sort
through the stored data, manufacture the object and return it. Such objects,
manufactured by the Data abstraction layer rather than stored in the data
storage structure, might be called ‘soft objects’.
The notion that some derived data rightly belongs in the Business
services layer runs against the received wisdom, so I return to soft objects
and derived data in later chapters.
Is there any similarity between the two entity models in Fig. 1d?
Fig. 1d
It is hard for us to spot that these unstructured models are two
instances of the same general pattern. Tools don’t care how messy the diagram
looks, but people do.
A tool can help us by redrawing the models in a hierarchically
structured fashion, placing the parent of each relationship above the child. If
the analyst always draws the relationship from the parent to the child, and the
tool constrains them to draw one-to-many relationships in this direction, then
the tool can easily remember which end of a relationship is parent and which is
child.
The analyst may request: ‘Please reshape my diagram for me in a
hierarchical fashion.’ A tool can respond by redrawing the two models as in the
Fig. 1e below.
(By the way, algorithms that try to avoid crossing lines become
increasingly useless as the complexity of a network diagram grows.)
Fig. 1e
After a person or a tool has rearranged the diagrams hierarchically, it
is much easier to see these are both examples of the double-V shape shown in
Fig. 1f.
Fig. 1f
This shape is a generative pattern that prompts you to ask an analysis
question.
The analyst
may request: ‘Please highlight or report on any double-V shapes for me.’ A tool
might respond by thickening or colouring the questionable relationships, then
ask the analyst the following question.
Ask
of a double V shape: Can you tie an object of one child entity to only one
object of the other child entity? If yes, connect the two child entities by a
relationship, to capture the constraint.
E.g. Fig. 1g shows a book can only be loaned to someone who is a member
of a library; and a time sheet must be submitted by an employee within an
employment.
Fig. 1g
Hierarchical arrangement makes it easier to see triangles. After asking
the earlier question about triangles, you are left with two Y-shape structures.
Fig. 1h
The classes at the heart of the Y shapes in Fig. 1h represent real-world
entities. Users create objects of these classes in order to constrain the
creation of objects of the class at the bottom. But there is another kind of Y
shape.
The examples so far have revealed two kinds of Y shape. Fig. 1i shows the class at the heart of the Y shape can be either a domain class or a derivable sorting class.
Fig. 1i
Objects of a domain class are created by users. The domain class at the
heart of a Y shape might represent a business entity with attributes of its own
(like Membership and Employment in Fig. 1h), or it might be no more than a
key-only link class that relates its two parents.
Objects of a derivable sorting class can be derived from the existence
of child objects. An example of a derivable sorting class appears as part of
the solution to the problem described in the next section. But first, a warning
that some of our patterns can appear in disguise.
The basic patterns or shapes can be obscured by intermediate classes. Fig. 1j includes a double-V shape, even though the Job class sits in the middle of one side of one of the two V shapes.
Fig. 1j
Readers may like to consider ways to resolve this double-V shape for themselves. A possible refinement of the structure appears later in this chapter.
The idea of teaching patterns is that analysts should save money by
getting the system right first time. But the patterns are just as useful if you
are trying to correct or improve a system that isn’t working correctly. What
follows is based closely on a real example.
The business has an enterprise application for recording what it does to
meet customers’ needs. The business supplies ingredients to food manufacturers.
Ingredients are packaged in various ways, by size, quality and so on, to make
distinct products, each with a distinct price. People (‘Contacts’ below)
enquire about products. They may be sent a brochure and/or samples. They ask
for quotes; they are given prices for specific products. They place purchase
orders for a quantity of product at either the current price (an attribute of
product) or the price given to them in an earlier quote.
The manager asked for our help. He had already set up a database, using
an application generator, to record customers orders, and requests for
information about products. Fig. 1k shows the structure of the database.
Fig. 1k
The manager had quickly generated a system to maintain this database,
but problems were now being experienced with the quality of the information in
it. The problems centred on the multiple-V shape, that is, the four child
entities owned by the same two parents, Product and Contact.
The historical record of a contact’s interest in a product was patchy, incomplete and out-of-date. Users forget, or cannot be bothered, to set up an Interest in Product record every time they record an Enquiry, Quote or Purchase Order.
Spotting the multiple-V shape prompts us to ask the question: Is an
Interest in Product related to the various possible reasons for that interest?’
Of course it is. Fig. 1l partially resolves the double-V shape by setting up
explicit relationships in the data structure.
Fig. 1l
An Interest in Product record is now created automatically, whenever the
detail of an Enquiry, Quote line or Purchase Order Line is recorded for a new
combination of Contact and Product.
Note that the model does not match the pattern in Fig. 1i in one way;
objects of the new derivable sorting class need not have any children.
Fig. 1m illustrates the transformation described in the volume
‘Introduction to rules and patterns’ whereby you might elaborate the model to
show the rule that there must be at least one ‘Reason for Interest’.
Fig. 1m
The trouble with introducing this rule is that it constrains us never to
maintain an object of the class Interest in Product without a reason. The ‘at
least one child’ rule is more rigid than is required by this business, so I
will relax it again.
There is still a multiple-V shape in the model. The three bottom-level
classes are all owned by both Product and Contact parents.
End-users cannot record for historical analysis whether the price they give for an order line is the current price, or the price given on an earlier quote (they have some discretion to price order lines in either way). Also, they lose track of which quotes have been successful, that is, which quotes have resulted in orders. Fig. 1n resolves the multiple-V shape.
Fig. 1n
Further analysis of the child entities jointly owned by both Product and
Contact may lead you to ask: Are users interested in whether a quote line
results from an enquiry? or an enquiry led to a quote? or a quote resulted in
an order? If so, you might add further relationships to the model. The
exclusion arcs show that not all order lines come from quote lines, and not all
quote lines stem from enquiries.
A third reported problem in the case study centers on another kind of
pattern. The reported problem is that users are recording products in the
database that cannot actually exist, products with impossible combinations of
size and ingredient. Fig. 1o shows a pattern I call the relation shape.
Fig. 1o
Ask
of a relation: Will users control the valid combinations of different
attributes?
If yes, then create a V shape from the domain classes.
Fig. 1p introduces a V shape domain class.
Fig. 1p
The introduction of a V shape
domain class above the relation currently looks like the best design option for
most applications.
It is important to realise that not all triangular or double-V shapes
are bad. It would be a mistake for a tool to automatically remove all such
structures from a specification. Below are three cases where a triangle is a
valid structure.
Children
with optional parents
Fig. 1q shows a triangle that is valid because one of the short indirect
relationships is optional at the bottom end.
Fig. 1q
This case is well-known and has been illustrated by many others. Cases
where all relationships are mandatory at the bottom end are more interesting.
Fig. 1r shows triangle that is valid because there is a current 1:N relationship in parallel a historic N:N relationship. The current relationship to the link class is monochronous (one at a time); the historic relationship to the link class is polychronous (several at a time).
Fig. 1r
Some argue the current relationship is redundant because it is a subset
of the historic relationship; but removing the current relationship creates
redundant processing.
Without it, to find the current Department of an Employee you have to
hunt through the historic memberships for the latest one, and then perhaps
check that is still active. This redundant processing is avoided by making the
current relationship explicit.
Fig. 1s shows a triangle that is valid because the bottom-level child may have two different top-level parents.
Fig. 1s
Ask
of a triangle: Is the bottom-level object related to the same top-level object
via both sides of the triangle?
Specifying
the constraint that parents are the same
To say that a Task can only be done in the same Department that the
Employee is contracted to, you should remove the long direct relationship.
Specifying
no constraint.
To put no constraint on what Department a Task is done in, you can
define the Task as having two Department attributes, one direct and one via
Employee.
Specifying
the constraint that parents are different.
To say (bizarrely) that a Task can never done in the same Department the
Employee is contracted to - you specify the constraint by defining the Task
with two Department attributes (foreign keys inherited by different routes)
with the rule that these cannot match each other.
Fig. 1s
There is another way to specify the last rule - that a Task can never done in the same Department the Employee is contracted to. Fig. 1u shows you introduce a V shape domain class.
Fig. 1u
Introducing a V shape domain class in this case is an exceedingly clumsy
solution, because all but one Department is valid for each Employee.
Constraints that exclude a single value from a range are normally specified as
a rule restricting the domain of an attribute of a class, as shown on the
previous diagram.
However, multi-value constraints are normally better specified in the
form of relationships. If there were a range of Departments for which an
Employee is allowed to do a Task, then the structure above would be a good
specification of this constraint.
You cannot remove
any of the relationships in a diamond shape (unlike a triangle shape) without
loss of information from the specification. But you should still
Ask of a diamond shape: Is the bottom-level object related to the same top-level object via both sides of the diamond? If yes, you can specify this constraint by defining for the bottom-level object just one foreign key attribute identifying the top-level object. In section 2, a Pupil has just one Local Authority name. If no, you can specify this degree of freedom by defining for the bottom-level object two foreign key attributes, one for each of the top-level objects.
Fig. 1v shows
a Fire Appliance can be related to two Counties: the County where the Incident
is that the appliance is attending, and the County where the Fire Station is
that the appliance is based at.
Fig. 1v
Fig. 1w shows the answer to the earlier question. It includes a diamond
shape. So how do you specify whether the Interview has only one Skill Type, or
may have two?
Fig.1w
Specifying
the constraint that parents are the same
To say that an Interview can only be arranged for a qualified Applicant
who has the same Skill Type as that of the Job, you can define the Interview as
having only one Skill Type attribute (the same foreign key inherited by
different routes).
Specifying
no constraint
To say that there is no rule on whether an Applicant must be qualified
for a Job or not, you can define the Interview as having two Skill Type
attributes, without any constraint on their values.
Specifying
the constraint that parents are different
To say
(bizarrely) that an Interview can only be arranged for an unqualified Applicant, you can define the
Interview with two Skill Type attributes (foreign keys inherited by different
routes) with the rule that these cannot match each other.
By the way, the classic example of a diamond shape or boundary clash is
the ‘Telegrams problem’ described by
Fig. 1x
The need for this kind of two-pass serial file processing has been
reduced by the introduction of network databases that can impose many clashing
hierarchical structures on the underlying data. In terms of an entity model,
A later chapter discusses another design issue raised by the diamond shape - the possibility of a process that travels from top to bottom, or vice-versa, via two different routes.
Some relatively advanced techniques for analysing data structures,
including reasons to contravene 4th and 5th normal forms by maintaining
derivable sorting classes.
You may find, perhaps as a result of relational data analysis, that some classes have compound keys, but there are no parent entities with elements of the key.
Ask of a class with a compound key, what classes exist with keys
made out of its parts? Given a two-way compound
key, then try transforming the class into a V shape with two parent entities.
Fig. 2a shows V shapes you can generate from the classes Holiday Feature
and Client Requirement in a Travel Agency
Fig. 2a
Fig. 2b shows V shapes you can generate from the classes Patient
Admission and Employment Contract in a hospital system.
Fig. 2b
Fig. 2c shows V shapes you can generate from the classes Task and Course
Booking in a personnel system.
Fig. 2c
Given a
three-way compound key, then try transforming it into either a double Y shape
or a triple Y shape, as shown below.
E.g. Suppose
Surgical Operation has a compound key of Patient, Hospital and Surgeon (perhaps date and time ought to be
included as well, but I shall gloss over this). You may draw a shape with three
simple key classes, and two or three two-way key classes.
Assuming all Surgeons in a Hospital are allowed to operate on all
Patients in the Hospital, analysis may reveal the classes shown in Fig. 2d.
Fig. 2d
Fig. 2e introduces an extra class to model the constraint that Surgeons
in a Hospital can only operate on a Patient in the Hospital after the Patient
and Surgeon have both signed a consent form.
Fig. 2e
The three-way key class is necessary. A Surgical Operation records an
event in the real world and it has attributes of its own. But some three-way
key classes are redundant. They result from data analysis of a poorly designed
input or output document, where there ought instead to be two or three two-way
keys.
Reducing to
fourth normal means replacing a derivable three-way key class by two classes
with two-way keys. Fourth normal form is most easily explained in terms of a
pattern.
Ask of a double Y shape, does the class at the bottom: have only the keys of its parents (no additional attributes)? derive mechanically from joining its two parents? If yes, then the bottom class can be discarded, provided that the two parent entities are retained.
E.g. Fig. 2f shows that the Suitable Holiday class is merely a product
of matching Holidays against Client Requirements. It can be derived at any
time, and need not be placed in the model.
Fig. 2f
By way of contrast, Fig. 2g shows that the Holiday Booking class below
is not merely a product of matching Holidays against Client Requirements. It is
record of an event in the real world that users want to remember.
Fig. 2g
The Suitable Holiday and Holiday Booking classes give rise to a double V
shape, resolvable in the normal way, as shown later.
A double Y shape may be incomplete at the top. Its essence is the V at
the bottom - a class with unique compound of three attributes that appear in
parent entities as unique two-way compounds.
E.g. Fig. 2h shows that if each
Fig. 2h
Reducing to fifth normal means replacing a derivable three-way key class by three classes with two-way keys, where there is a ‘join dependency’ preventing all possible combinations of the key values from existing. Again, fifth normal form is most easily explained in terms of a pattern. Ask of a triple Y shape, does the class at the bottom:
• have only the keys of its parents (no additional attributes)
• not derive mechanically from joining any two parents?
• derive mechanically only from joining all three parents?
If yes, then the bottom class can be discarded, provided that its three parent entities are retained.
E.g. Fig. 2i shows that the Suitable Holiday class below is not merely a
product of matching Holidays against Client Requirements. It is constrained
also by the need for the Client to express an interest in the
It can be derived from joining all three parents, and may be discarded
from the model.
Fig. 2i
Designers are familiar with tradeoffs between:
• minimising
redundant processing versus
minimising redundant data
• simplifying
enquiry processes versus
simplifying update processes.
Theoreticians tend to advocate the latter option in each case. They say
to eliminate all redundant data, including derivable key-only classes, and to
minimise update processing. They don’t say these options may conflict with each
other. Consider the derivable sorting class called Suitable Holiday in the
entity model below.
Fig. 2j
Since the system is designed to produce reports of Holidays suited to
Clients, and reports of Clients suited to Holidays, the derivable sorting class
will be useful.
Obviously, it will simplify and speed up enquiry processes. Without it,
you will repeatedly have to manufacture Suitable Holiday objects in views of
the data structure that users request for presentation. And you might have to
account in some way for earlier Holiday Bookings on a Suitable Holiday.
Less clearly, the Suitable Holiday class can also simplify and speed up
update processes. When a Holiday Booking is made, you can more easily check any
history of previous Holiday Bookings. When a Client makes a Holiday Booking,
you can more easily locate and check any Holiday Booking already made for the
same compound of Client and
Overall, it may prove cheaper to maintain Suitable Holiday as a sorting
class than to leave it out. This contravenes the established view of physical
database design. See the chapter ‘Clashing entity models’ for discussion of how
and why you might maintain a derivable sorting class in the entity model rather
than the data storage structure.
Where a class
has a list of similar attributes, you can generalise these attributes into a
relationship. Fig. 2k shows an example drawn from Assenova and Johannesson
[1996].
Fig. 2k
The transformation in Fig. 2k is not very common in practical system
design. Fig. 2l shows two more common transformations.
Fig. 2l
The patterns are discussed separately
on the next page.
Ask
of a class with a list of similar attributes: can the attributes be generalised
into a single type?
E.g. consider the three totals recorded in the Paper class in Fig. 2m.
You might show the common properties of the three attributes by relating all
three attributes to a single domain class. The resulting shape is called
Tramlines.
Fig.2m
Ask
of a tramlines shape: Can the relationships be generalised by creating a V
shape with a child link class?
E.g. Fig. 2n shows the transformation of the tramlines in Fig. 2m.
Fig. 2n
Finally I come to a shape you often see in large business databases - a
core entity, surrounded by many parents and many children. Fig. 2o shows this
as the X shape.
Fig. 2o
I call this the X shape (in line with the V,W and Y shapes) but you
might better call it a star shape, since it can have many points, perhaps a
dozen parents and a dozen children.
Ask of an X shape: Are there constraints between parents and children that are missing?
This is a rather vaguely-defined shape and a rather vague question. I don’t say how many points the X shape must have before it is likely to reveal significant missing constraints. Nor do I prescribe what to do in response to the question. Further research may reveal further rules of thumb in this area.
Designing an entity model for maintenance, anticipation of amendments.
Surveys tell us that maintenance costs far outstrip initial development
costs; 70-30 is a proportion often quoted. Some hold out ‘design for
maintenance’ as a primary goal of system development.
Analysis patterns can help you to design for maintenance and facilitate
amendments.
But maintainability cannot be the primary goal. Correctness must be the
primary goal. You should strive to get the system right this time, not next
time. If you don’t strive, you won’t succeed. And if you don’t succeed, you’ll
have to spend more on ‘maintenance’ later.
Other surveys tell us it is cheaper to correct errors sooner rather than
later. It is obviously much cheaper to revise analysis documentation than
program code in a working system. So people have proposed ways of exposing
errors in analysis and design as early as possible.
One way is to follow an analysis and design methodology that produces
graphical design documentation. Current methodologies have many weaknesses.
Above all, they lack effective quality assurance mechanisms. It is no use
having paper mountains of analysis and design documentation if nobody can tell
whether the documentation is any good or not, and programmers throw most of it
away.
Analysis
patterns provide a solution to this problem; they provide quality assurance
questions.
Another way is to follow the path of ‘iterative development’, rapidly
producing prototypes of parts of the required system. Prototyping makes design
results more concrete, more visible, so you can more easily see if designers
are going in the wrong direction and head them off.
An enterprise
application will not be entirely right first time. Some amount of trial and
error is necessary. Some amount of iterative development is inevitable. But
setting out with the objective of
delivering a wrong system, then developing it by trial and error, is likely to
add time and costs to the overall project.
Iterative development stretches the costs of development over smaller
cycles. In effect, it moves maintenance (which we know to be expensive) into
the development phase. Change control and configuration management become
bigger issues. So if you iterate more than a a couple of times, the overall
project will cost more and take longer.
Iterative development runs counter to design for maintenance. Designers
who are focussed on the next small increment won’t take a long-term view. The
code will grow haphazardly with each iteration into a pile of spaghetti that is
hard to maintain. Agilists consequently promote “refactoring”. And of course,
good design up front will reduce refactoring costs.
Iterative development encourages low expectations. Designers who think
it normal and acceptable to deliver unfinished code will not strive hard enough
to get the system right before giving it to users. Designers have an excuse to
escape from their responsibility to do their best work.
Is there a credible way to improve on iterative development? Current
methodologies are failing us. We lack a methodology that embodies professional
expertise about designing for correctness and designing for maintenance.
Specification and design patterns address this problem; they encourage
right-first-time design and can reduce maintenance effort.
In one sense there is no such thing as maintenance, there is only further development. The things you have to do in maintenance are the same as you have to do in development.
If it means anything, design for maintenance means designing the current
system in a different way from how you would design it if no changes were ever
expected.
It is meaningless to design for maintenance per se. Flexibility in every
direction is impossible. Changes come from many different angles. You have to
decide what changes are likely, and design with those changes in mind.
There are three basic design for maintenance strategies.
Some changes are due to new technology, perhaps a new database management system or new user interface management system. The way to anticipate changes in technology is to isolate, as far as possible, those parts of a system that are technology-specific.
Other changes are due to new user requirements: people changing their
mind about the way they want to system to operate. New user requirements may be
subdivided into ‘correctness’ requirements and ‘usability’ requirements. The
way to anticipate changes in these requirements is to isolate, as far as
possible, those parts of a system that are specific to specific kinds of
requirement.
You can separate these concerns using the high-level analysis pattern of
the 3-schema architecture. This architecture is divides an enterprise
application into subsystems that isolate different areas you may want to
change.
Software layer |
User concern |
Technology concern |
UI layer |
user-friendliness |
GUI
environment |
Business
services layer |
business rules
and constraints |
App server |
Data services
layer |
Performance |
database
management system |
You can anticipate exceptional cases by not constraining the system to accept only normal cases. This can prove counter productive. Some of the tradeoffs are discussed in the next section.
You can generalise the design so that it is easier to accommodate new cases and reconFig. the system with new rules. See the section after next.
After you have specified constraints within the Business services layer
of code, you may find you have to relax them to deal with exceptions. Users
often submit maintenance requests asking for the freedom to break the normal
rules, record unusual cases not previously envisaged.
A natural reaction is to relax the constraints on datan entry. This
increases the danger of incorrect system usage and gives users the opportunity
to screw up the system. Users need a system that constrains datan entry,
prevents garbage from being stored in the database.
The trouble is that to design for exceptions, rather than reduce the
constraints on the current system, can make the system considerable more
complex.
Fig. 3a shows the entity model of the Marriage Registration system, introduced in the volume ‘Introduction to rules and patterns’.
Fig. 3a
One problem might be that changing a Person’s recorded sex would
automatically invalidate all previous Marriage of that Person. All historic
Marriages for that Person would now be in a state inconsistent with the rules
of the system - recorded as being between two people of the same sex.
The solution is to record a Person’s sex of birth separately from their
current sex, and apply the validation constraint only to their sex of birth.
This tiny Marriage Registration system has caused much debate in our
tutorials, on grounds ranging from design and coding style, to culture and
political correctness. Please don’t be offended if I go on to illustrate laws
and societies you disapprove of.
An exclusion arc over the relationships implies that the class at the
focus of the arc may be divided into subtypes. In this case, the two subtypes
are man and woman.
Suppose the Marriage Registration system is bought by a country where sex changes are illegal and unrecognised. You might specify a fixed class hierarchy as in Fig. 3b.
Fig. 3b
Types
It is normal for an entity state record to belong to many types. You
might regard sex and job title as types of a Person.
Types are not normally represented as class hierarchies in the entity
model of an enterprise application. One reason is that types often turn out to
be additive rather than mutually exclusive - a Person can can have more than
one job title at once - a bisexual Person might be recorded as having two
sexes.
Another reason (the one that applies here) is that with the passage of
time, an entity may change its ‘type’ many times. You may reasonably expect
that most if not all of an entity’s types can be altered during the life
history of an object.
States
The longer an object persists, the more that a type (even one as fixed in
real life as male or female) tends to become a temporary state.
I don’t think of the object as changing class each time one of its types
is updated. I think of it as remaining of the same class, but changing its
state. Where a type change or state update constrains the future behaviour of
an object, this is most naturally specified as a state-transition in the life
history. So the type becomes a state variable.
You can specify the cyclical alternation between sex roles as state
changes within the state machine of a Person. This state machine will record
the current state, the current sex role, but not remember past ones.
Suppose the system is bought by a country where transsexuals are allowed to contract a marriage in their new sex. After a few months, the users submit an amendment request:
“Can we please be allowed to record the exceptional case where, over
time, a Person plays both husband and wife roles in different Marriages?”
You might simply erase the exclusion arc constraint, as shown in Fig.
3c.
Fig. 3c
Is it worth removing the constraint on transsexuals remarrying under the
new sex, just for the sake of just one or two individuals?
In general, relaxing a constraint may cause more trouble than it saves,
by allowing some normal cases to be erroneously recorded as exceptions. You
have to trade-off giving the end-users freedom to process rare cases, against
specifying constraints that maintain the quality of stored data for the normal
cases.
In this case, the danger is slight. A Marriage is still defined as
connecting one Person of each sex. So users must change the recorded sex of a
Person before they can record a Marriage under their new sex role. It would be
difficult to do this by chance, in error.
Again however, changing a Person’s recorded sex would automatically
invalidate all previous Marriage of the same Person. These would now be in a
state inconsistent with the rules - recorded as being between two Persons of
the same sex.
The previous solution, of recording a Person’s sex at birth separately
from their current sex, only works if a Person can only change sex once. The
proper solution is to record the life history of a Person’s sexual roles, and
attach each Marriage to the period of time that they play a given sex role.
To keep a history of a Person’s sex changes, and record the Marriages contracted within each sex role, you should extend the entity model as in Fig. 3d.
Fig. 3d
Is it worth enriching the specification to record history? Yes if the
user wants to be able to inspect past occurrences of an object’s state. Yes if
it helps you maintain the constraints on system behaviour. It is now possible
to change a Person’s sex and record new Marriages, without invalidating all
their previous Marriages.
Can you have it both ways? Can you place constraints on the normal cases, yet also give end-users the freedom to process rare cases? Yes, but at considerable expense.
You might design the user interface so that the user is presented by
default with the normal case - the possibility of entering a Marriage between
two people in their sex at birth. To enter an exceptional case, the user must
make a conscious effort to pop up a menu and select an entry for entering an
exceptional case - Marriages involving one or more Transsexuals.
Fig. 3e enriches the entity model to show all possible valid types of
Marriage as distinct classes.
Fig. 3e
In Fig. 3e, more than 50% of the design effort is devoted to handling
what are likely to much less than 1% of the cases.
Is it worth enriching the specification to distinguish normal cases from
exceptions? You should present the development costs for users to decide.
So far, users must change the recorded sex of a person before they can record a marriage under their new sex role, since a marriage is still defined as having one partner of each sex.
You can anticipate more exceptional cases by giving control over the
system’s rules to the users. Fig. 3f generalises the specification so that
users can define new kinds of marriage, with new combination of sexes, or even
more than two people.
Fig. 3f
Fig. 3f is only an illustration, not a serious design. I look further at
generalising classes in the next section.
Focusing on the specification of constraints within the Business
services layer of code, how do you anticipate changes, design for ease of
amendment, ahead of time?
You can anticipate changes in requirements by generalising aspects of
the design. There is a trade off however. Generalisation can make a system
harder to understand, and harder program, and possibly harder to use.
A
rigid hierarchy
Imagine a personnel system that records a company’s organisation
hierarchy. You might specify an entity entity model of the kind in Fig. 3g.
Fig. 3g
A rigid
hierarchy is a generative pattern. You should ask: Is it possible for levels of the
hierarchy to be omitted?
Suppose you find out the system has to record Companies that don’t have
Departments, Divisions that don’t have Departments, and Company Employees who
are not allocated to any Division or Department. Fig. 3h shows a more generic
model.
Fig. 3h
The entity model is smaller and more flexible. On the other hand, the
system is a harder for designers to work with, and it is more difficult to give
users the same kind of usability.
Imagine a vehicle licensing system that records the various reasons why a
Person is related to a Vehicle. You might specify an entity entity model of the
kind in Fig. 3i.
Fig. 3i
But how many
other reasons are there to relate a Person to Vehicle? What about Thief?
Damager?
Ask
of a multiple-V shape: Is it better to anticipate extra relationships by
generalisation?
Fig. 3j
Again, the entity model is smaller and more flexible. On the other hand,
the system is a harder for designers to work with, and it is more difficult to
give users the same kind of usability.
Imagine a simple accounts system that records sales. You might specify an entity entity model of the kind in Fig. 3k.
Fig. 3k
The model is clear and specific about what entities are to be recorded
in the system. If you translate this model into a data storage structure, then
designers can easily write enquiry programs that report on all the sales for
one customer, or all the sales of one stock type. Both designers and end-users
can readily see what the classes are for, and use them correctly.
But suppose your prime design objective is flexibility. Your brief is to
make sure the system can be extended to record new entity types (a Return of
Goods, a Salesman), and heaven knows what else.
To accommodate future requirements, you might specify only a few generic
classes: ‘Contact’ instead of Customer and Supplier and ‘Stock Transaction’
instead of
Fig. 3l
This model is more flexible. To record a Salesman, all you need to do is
extend the range of ‘types’ allowed for a Contact. You don’t have to change the
structure. On the other hand:
• there is more danger of giving the end-users a system that fills up
with garbage. It is easy to imagine people mistakenly entering Customers as
Suppliers, or Sales as Purchases. Designers will have to work that much harder
to constrain how the system used and give users the same degree of usability.
• the system is no longer so easy for designers work with. The
programming is more complex. You will have to write extra code to test the
contents of a Stock Transaction object, to find out what subclass it really is
(
• the system’s performance may be degraded, because events and enquiries
that require access to all the objects of a logical class (all Sales), will
have to trawl through all the objects of the physical class (all Stock
Transactions).
It is easy to get carried away with the idea of generalisation and take it too far. Nobody in their right mind would go the extreme shown in Fig. 3m.
Fig. 3m
Or would they? There is a real motivation to do this, the cost of ‘data
migration’. Data migration is a serious issue in system maintenance and its one
of the issues I come back to in chapter 6.
Briefly, other chapters suggest you can have your cake and eat it too. You can separate the entity model from the data storage structure. You can code the entity model in the Business services layer on a business rules server - where it can be changed without the need for data migration. You can code the more generalised entity model as the data storage structure on a data server - where it will be sufficiently flexible to reduce the need for data migration. You may find it is not easy to do this using current technology. However, SQL is a natural tool for implementing the Data abstraction layer that is necessary to achieve this separation, and ODBC technology is also helpful.
References
Ref. 1: “Software is not Hardware” in the Library at http://avancier.co.uk
Footnote 1: Creative Commons
Attribution-No Derivative Works Licence 2.0
Attribution: You may copy,
distribute and display this copyrighted work only if you clearly credit
“Avancier Limited: http://avancier.co.uk”
before the start and include this footnote at the end.
No Derivative Works: You may
copy, distribute, display only complete and verbatim copies of this page, not
derivative works based upon it. For more information about the licence, see http://creativecommons.org