Deeper thoughts about entity models

This booklet is published under the terms of the licence summarized in footnote 1.

Five kinds of entity model

What is a class? What does a box in an entity model represent? Should either subclasses or parallel aspects of a class be specified as distinct classes? You can’t have it all ways in one entity model.

This chapter shows five kinds of entity model. A methodology or technology can accommodate clashing views of what a class is, by allowing several versions data structures to coexist, each entity model being directed at a different purpose.

The inevitability of structure clashes

Booch (1994) probably speaks for most computer scientists when he says ‘mapping an object-oriented view of the world onto a relational one is conceptually straightforward, although in practice it involves a lot of tedious details’. We cannot deny the ‘tedious details’, which means that even trivial examples of enterprise applications take up a lot of space, but this chapter offers a challenge to the ‘conceptually straightforward’.

Although the individual data items of system specification are very important, we are more troubled in enterprise applications by how to specify the rules and constraints governing larger objects, higher-level relations or aggregates of data items. Aggregation becomes an issue.

Whereas inheritance implies the generalisation of mutually exclusive subclasses into a higher-level class that may be smaller than its subclasses, aggregation implies the summation of parts into a higher-level class that must be larger than its parts.

There are clashing views of how to aggregate properties to form an object class. Different ways to group data items into classes lead to different entity models, that is different structures specified over the top of the data items. We need to understand the different possible views and their implications.

A 3-tier software architecture can reconcile various views by handling each view in a separate layer. Fig. 5a gives a rough idea of what we mean. The chapter will explain.

Fig. 5a

Structure clashes

Aggregation is the means by which simple elementary components are added together to form larger and more complex structures. You can build one set of elementary components into an infinite number of different, clashing, higher-level structures.

For example, a school geography book presents several maps of the same area, showing different divisions of the earth’s surface into:

• land masses bounded by oceans

• countries bounded by political administration

• territories bounded by climate or vegetation.

It would be foolish to say that any one of these is the ‘right’ view. And there are further clashing aggregations of people: by race, religion and native language. Fig. 5b shows part of the world where aggregations have been disputed for thousands of years.

Fig. 5b

The ‘correct’ aggregation of components in this example has long been a matter of dispute. Land borders have been redrawn several times. Peoples have moved en masse between Scotland and Ireland (and from both to the USA). Eire, the ‘Irish Free State’, separated from the United Kingdom as recently as 1921.

Components in this example are members of higher-level aggregations, Europe, NATO, the United Nations, and so on. There are further structure clashes between these higher-level aggregations. Again, it would be foolish to say that any one view is ‘right’.

Classes as elementary data types

Starting at the bottom level of system specification, the atomic particles are the data items or variables. Each variable has a data type that defines its valid range of values. The volume ‘Introduction to Rules and Patterns’ showed how you may declare the data type as what may be called domain class.

Generic domain classes such as Text and Integer are widely reusable. You may define more application-specific domain classes such as Telephone Area Code and Country Name for reuse in a local context.

Where several variables (say, Price and Telephone Area Code) share the same domain (say Integer), the variables are subclasses that inherit the properties of the one domain. You can arrange domain classes themselves into a complex inheritance tree.

Early object-oriented authors such as Meyer (1988) were much concerned with designing production software operating at the level of domains. They sought reuse of code via inheritance between domain classes. Some wanted object-oriented to provide a once-and-for-all inheritance tree of domain classes, and so save them from have code from scratch each new programming language, operating system and CASE tool.

But for enterprise application specification a simple two-level structure of domain classes is usually enough: generic domain class (say, text) and application-specific domain (say Country Name). In practice, Analysts spend little time on specifying domains. They spend most of their time thinking about the rules operating on larger relations, each an aggregates of variables about a business entity.

Specification of classes at the level of relations is different from specification of classes at the level of domains. Among database-oriented authors, Chris Date (1994) makes the remarkable statement: ‘Object classes are domains or data types. Questions about inheritance therefore apply to domains, not to relations.’ Is this true? What is a class anyway?

Classes as aggregations

To make an aggregate, you simply add components together. There are an infinite variety of way to group data items into aggregate classes. It is hard to make any general statement about what an aggregate class is, until we separate the three layers of the three-tier architecture.

Layer	Objects like	In domain of
UI	menus, windows, buttons etc.	user interface technology
Business services	business entities and rules	business users
Data services	tables, records, indexes, etc.	database technology

This three-way separation of concerns recurs throughout our work in information analysis. It helps us to separate different kinds of problem, and retain this separation from analysis and specification through to coding. It enables us to change the data storage or UI layer with minimal disruption to the essential business components.

Classes in the data storage structure

Aggregate classes appear in the form of persistent database tables or record types. Database programs treat each table as a distinct object. Some database management software expects you to specify larger aggregate classes such as database blocks or pages.

The size and scope of each aggregate class in the Data services layer is physical. It is guided by considerations of efficiency and limited by the database technology. Each internal class may roll up data from several entity classes, or store only part of one entity class.

Classes in the UI layer

Aggregate classes appear in the form of transient input messages, output messages, windows and dialogue boxes. The GUI management software will treat the data structure in a window as a single object when, for example, it moves it around the screen.

The size and scope of each aggregate class in the UI layer is physical. It is guided by considerations of usability and limited by the user interface technology. Each UI layer class may roll up data from several entity classes, or display only part of one entity class.

Classes in the business services layer

The Business services layer includes both persistent aggregate classes (objects) and transient aggregate classes (events and enquiries).

Events and enquiries are naturally limited in size and scope by the rule that a transient event is a minimum unit of consistent change. But what rules limit the size and scope of the persistent object classes? In other words: How should you group the persistent variables into aggregate classes in the entity model?

The size and scope of each aggregate class in the business services layer has nothing to do with technology, or physical objects such as database tables or GUI windows, it is a matter of logic.

Three logical views

You might define a class as an aggregate of variables around three different centres:

aggregate centred on	means the class is
a key	a third normal form relation
a type	a type within a class hierarchy
a state variable	a state machine (or ‘parallel aspect’ of a relation)

So far, authors have tended to gloss over the possibility of structure clashes between these different views of a class. In trivial examples, where there is no structure clash, the different definitions lead to the same set of classes. You will draw the same data structure whichever definition you pick.

But for non-trivial enterprise applications, we can no longer pretend that the different definitions give the same answer. The more complex the system, the more the logical views diverge from each other. The data pictures you draw will depend on the definition you pick.

There are clashes between logical views of a class, and then between logical and physical classes. You may design the physical database tables and GUI windows to match either the relations or the state machine, but you can’t match both. We need a richer theory of system specification, a software architecture that results in separates components handling each logical and physical view.

A more formal view of entity models

Entity model notation is not the issue here. The notation we use is only one of several possible notations. The same questions and choices arise whatever notation you use.

Different ways to group variables into classes lead to different entity models, that is different structures specified over the top of the variables. Analysts often draw entity models that are mixture of different styles without realising it. They should understand the different possible approaches and their implications.

The informal entity model

At the earliest stage of system specification, you should be free to draw any informal picture that helps you. For the sake of giving this kind of data structure a name, we’ll call it an ‘informal entity model’. This is an informal picture of the data in which an ‘entity’ is whatever level or size of data group you want it to be.

The relational entity model

Given an enterprise application consumes data and produces information, you may uncover the classes of interest by relational data analysis of the data in forms, reports, screens and files. The result of such analysis is a relational entity model.

Fig. 5c

An early step in relational data analysis is to spot the business identifiers or keys. Given an object, the value of each of its attributes is uniquely determined by the value of its key. In what is called a ‘third normal form’ relation, the value of each attribute is determined by first the key, second the whole key, and third nothing but the key.

In the example above, each box is a relation, its key is underlined, its attributes are listed, and its associations to other relations are shown as lines connecting the boxes. The meaning of the different styles of line doesn’t matter here.

There is little freedom of choice about what the relations are, given that you know the users’ information requirements and you follow the idea that object instances are uniquely identifiable from each other by a key. But this is not the only logical view.

The entity state machine model

Another common logical view is that an object is something that progresses through a defined series of states, from a beginning to an end. In this state-transition view, a class is a state machine or behaviour, governed by a state variable (SV in the illustrations below).

People who design systems with little or no persistent data, typically embedded or process control systems, often view classes as state machines. They discover the classes by analysing states that classes pass through and the events that trigger state-changes.

You can use similar techniques for enterprise applications. You can transform a relational entity model via behaviour analysis techniques into a specification of event models that maintain the data. These event models include the required rules in the form of preconditions. It turns out that behaviour analysis may lead you to decompose a relation into parallel aspects. In the event models, each parallel aspect of a relation appears as a distinct class.

People often assume you can draw one state machine for each relation. However, you need to draw separate state machine for the parallel aspects of a relation.

Fig. 5d shows the entity model with one box per state machine. This differs from the relational view in that parallel aspects of a relation have been divided.

Fig. 5d

There is little freedom of choice about what the classes are, given that you know the rules and constraints and follow the idea that each class is a state machine controlled by one state variable (or none if it can be optimised away). Note that the behaviour analysis shows that most of the classes in this example have one-state lives, and so require no state-variable.

See the footnotes for some remarks on classes as state machines.

The data storage structure

Given a system that maintains persistent data, you must design the record types or tables into which the database will be divided. The physical database designer’s view is that a class is a record type or database table, that is the unit of input/output accessed by programs.

Database designers usually transform a relational entity model first into a data storage structure (technology-independent) and then into a physical database structure (technology-dependent). The data storage structure records decisions about the physical database structure, with one box for each table or record type.

There is much freedom of choice here. You might map logical to physical by designing a separate table for each relation, or each subclass in a class hierarchy, or each state machine. In complex systems you cannot do all of these at once.

Fig. 5e avoids the structure clash between different logical views by rolling up the logical subclasses and parallel aspects of Vehicle into one table.

Fig. 5e

Object-oriented purist’s entity model

There is one more logical view, considered in the next section. The inheritance-oriented view is that a class is something uniquely identifiable by a type or subclass. In our example, a Vehicle may be either a Car or a Truck. Should we show the subclasses as distinct classes in an entity model?

So by way of summary, you can distinguish four or five different ways to define what a class is, and so draw different data structures:

Kind of entity model	The entity or class is
The informal entity model	whatever data group you like
The relational entity model	a normalised relation
The entity state machine model	a state machine
The data storage structure	a database table or record
Object-oriented purist’s entity model	a type in a type hierarchy

The main point of this chapter is that all five of these views are reasonable and useful. What we need is an architecture that enables us to consider each view during analysis, and maintain clashing views separately in the modular construction of our software.

Subclasses and parallel aspects

This and the following sections detail variations in ways to draw entity models where a class might be divided into subclasses or into parallel aspects.

· For subclasses the options are called: aggregation, pseudo-inheritance and delegation

· For parallel aspects, the options are called: aggregation, partition and delegation.

Subclasses

Given a class hierarchy containing a superclass (say Vehicle) with two subclasses (say Car and Truck), people have proposed three different ways to specify the subclasses: aggregation, pseudo-inheritance and delegation.

Aggregation of subclasses into one class

This means specifying the entire class hierarchy as one class. You may draw an exclusion arc across relationships specific to different subclasses.

Fig. 5f

Pseudo-inheritance of super class into subclasses

This means specifying only the subclasses as classes, copying the properties of the superclass into each of them.

Fig. 5g

The pseudo-inheritance approach means repeating data specification (above, the relationships to Owner and Model) in an anti-reuse, anti-maintenance, kind of way. Nevertheless, at the expense of some duplication, a handy rule-of-thumb is to do this if the users employ a different range of identifiers for each subclass.

Since we have assumed that all kinds of Vehicle are identified by the same primary key, Reg Num, we would not divide in this example. By the way, if we did divide, then any process given only Reg Num as input data must perform a preliminary enquiry to find out which subclass to access.

Delegation of subclasses as detail classes of a master class

This means specifying the superclass and each subclass as distinct classes, connected by ‘is a’ relationships. In this entity model the boxes are things that are uniquely identifiable one from another by a combination of key and ‘type’.

Fig. 5h

Representing the subclasses of a class

Given four kinds of entity model and three ways to specify a class hierarchy as classes (aggregation, pseudo-inheritance and delegation), which way suits which kind?

Drawing class hierarchies in an informal entity model

Since the informal entity model is only an informal picture, it doesn’t matter which way you choose to draw it. However, delegation is common. People like to draw the subclasses as distinct boxes in an informal entity model because it helps them analyse the problem domain, even if they later decide to aggregate the subclasses, or use pseudo-inheritance.

Drawing class hierarchies in a relational entity model

Rule-of-thumb: if the users recognise objects of two subclasses by the same primary key use aggregation; if not use pseudo-inheritance. We aggregate in our example, because all kinds of Vehicle are identified by the same primary key, Reg Num.

Drawing class hierarchies in an entity state machine model

The structure clash between subclasses and parallel aspects needs further exploration. The difficulty is outlined here.

Before you can draw a box in an entity model for each state machine, you have to understand the right way to build state machines. You must somehow document the constraint that the state machines of the subclasses are mutually exclusive. The way to do this is by drawing a high-level selection between options, where each option represents a subclass.

Aggregation is wrong. It means drawing only one state machine for the class hierarchy, including the behaviour of all subclasses in it, duplicating the superclass behaviour under each subclass option.

Pseudo-inheritance is wrong. It means drawing state machines only for the subclasses, duplicating the behaviour of the superclass in each. This means there is no specification in the state machines of the mutual exclusion between subclasses. You must prefix some events by an enquiry to work out which class they affect. Given an event (Scrap Vehicle) that carries only the primary key of the object (not its subclass) there is no way of knowing to which class the event must be directed. The corollary is you cannot rely on generating all the event preconditions from the state machine documentation.

Drawing class hierarchies in a data storage structure

Aggregation of subclasses into one relation is normal. The subclass’s data attributes are contenders for the same data storage space. Aggregation also sits happily with the view taken in behaviour analysis.

Parallel aspects

A parallel aspect of a class (or as some object-oriented authors say ‘non-disjoint subclass’) is an independent group of attributes and relationships, whose behaviour is governed by a single state variable.

Behaviour analysis (and structure clash resolution) tends to lead you to divide an aggregate relation into its component parallel aspects. You may go as far as to decompose the behaviour of a relation into one parallel state machine for each attribute and relationship. However, a group of attributes and relationships that share the same state-transitions are normally lumped together in one state machine, making it a very low-level aggregate class.

Aggregation of parallel aspects into one class

Since aggregation rolls up all parallel aspects into one class, the picture of the case study here would be the same as that for aggregation of a class hierarchy shown earlier.

Partition of parallel aspects into several classes

The picture here is different from partition of a class hierarchy shown earlier. There is a structure clash between subclasses and parallel aspects. Dividing parallel aspects leads to this entity model.

Fig. 5i

Delegation of parallel aspects as detail classes of a master class

There is only a subtle difference between delegation and partition of parallel aspects.

In delegation, one of the parallel aspects is appointed as the ‘basic aspect’. E.g. the basic aspects in the example are Owner-basic and Vehicle-general. The ‘basic’ aspect can be thought as being at a higher level and owning all the others. This means we can connect all the other aspects to the basic aspect by association relationships.

Representing the parallel aspects of a class

Given four kinds of entity model (informal, relational, application and data storage) and three ways to specify parallel aspects as classes (aggregation, partition and delegation), which way suits which kind?

Drawing parallel aspects in an informal entity model

Aggregation is normal. People rarely draw one box for each parallel aspect in drawing an informal entity model. As far as code specification is concerned, it doesn’t matter which way you choose to draw the informal entity model.

Drawing parallel aspects in a relational entity model

Aggregation is normal. Relational theory doesn’t really account for parallel aspects, but one might say it assumes aggregation of parallel aspects into one relation.

Drawing parallel aspects in an entity state machine model

Partition seems the natural thing, since each box is supported by a distinct state machine. It enables The entity state machine model to be used as a map or graphical menu for navigating to the state machines.

Strictly, delegation is the right approach. One of the parallel aspects must be appointed as the ‘basic’ aspect, which can be thought as being at a higher level and owning all the others. The state machine of this basic aspect will normally be responsible for all different varieties of birth and death events, but may pass these on in the form of a ‘superevent’ to the other parallel aspects.

After aggregating the class hierarchy and delegating the parallel aspects in our example, the result is an entity state machine model that matches the event models. The boxes in it are state machine. The boxes are the classes for which state machines are constructed and that appear as distinct components in event models.

Drawing parallel aspects in a data storage structure

Partition is simplest. This sits happily with the view taken in behaviour analysis and facilitates the distribution of parallel aspects to different physical locations.

Aggregation of parallel aspects into one relation should reduce access times, but requires a little extra work in the data abstraction layer (components that retrieve logical application objects from the Data services layer, and restore them).

The need for a richer analysis methodology

What is an entity or a class? What does a box in an entity model represent? Different answers lead to different entity models (ignoring differences between notations). The picture is further complicated by different ways (aggregation, partition and delegation) to specify the ‘subclasses’ and ‘parallel aspects’ of classes.

Given there are many ways to draw an entity model, a methodology should help us to decide which way suits which purpose. Which way of drawing an entity model best suits database design? Which way suits object-oriented modelling? We need a methodology which disentangles the various questions involved and provide some answers.

The methodology implied by clashing entity models can be organised into a handful of major activities, where a fair amount of parallel activity is possible. Fig. 5j gives an idea of what we mean.

Fig. 5j

There is a progression from informal to formal, only two models appear in the implemented code - The entity state machine model and the data storage structure. In simple systems these will be the same. In very simple systems they will both be the same as the relational entity model. See ‘The OPEN book’ for further discussion of the methodology in Fig. 5j.

Coordinating software architecture

In non-trivial enterprise applications, rather than select one or other logical view as the basis of encapsulation, we want to have it all ways. Current object-oriented ideas are inadequate; we need a richer theory of system specification. We need a software architecture that results in separates components handling each logical view, and separate components addressing the physical concern of designing efficient database tables. Our 3-tier software architecture is designed for this purpose.

You can specify the data attributes of any entity model box using an underlying data dictionary. You can specify one-to-one correspondence between models by names. A CASE tool can help as discussed below. The earlier sections of this chapter show how you can make things easier by taking design decisions that align different versions of the entity model.

Technology implications

Using object-oriented ideas to specify the Business services layer does not mean you have to use object-oriented software tools implement the application. We do need some kind of:

· GUI management software to implement the UI layer

· Common programming language to implement the business services layer

· Database management system to implement the data services layer

We need a database management system that supports the notion of a commit unit, handles the back-out of updates to objects within a commit unit, and helps with locking and logging of transactions. It must have an efficient way of reading and writing the instance variables of not only of single object but of aggregates of related objects (say, all the details of a master).

CASE tool implications

An upper CASE tool could allow us to draw four models:

Kind of data structure	Each box connected to a
informal entity model	-
relational entity model	data group
entity state machine model	state machine and data group
data storage structure	physical database table.

Copy and paste functions will enable us to copy all or Component of one model into another. Ideally we’d like to draw only one ‘master’ relational model, then draw parallel versions only of the Components that differ in the ‘subordinate’ entity state machine model (where parallel aspects are separated) or data storage structure (say, where subclasses might be divided). Working out a pleasing way to cross-refer between the master and subordinate diagrams would be a challenge.

Copy and paste

The tool should allow you to draw multiple versions of a data structure. It should help you to duplicate whole entity models and copy and paste partial entity models between them.

Cross-reference by name

The tool should connect any entity model box to related documentation items by recognising the names (or possibly synonyms) of corresponding classes. The name of a box in an entity state machine model must match the name of a state machine. The name of a box in the data storage structure must match the name of a database table in the lower CASE tool. The name of an attribute must match the name of a domain in the dictionary.

Domain dictionary

The tool should help you to specify the data group behind any box in any entity model as a list of domains drawn from an underlying domain dictionary. This central repository should be independent of the internal data storage structure or the external user interface. Most current implementation tools tie variable definition too closely to one or the other.

Cross-layer class specification

A single class may have properties in each layer, a data storage format, a presentation or display format, and some application rules. Suppose the user asks for a class from one system to be added to another. We’d like to reuse the class without defining new data storage and presentation formats.

We need a way of defining an entity class and attaching to it the baggage of its data storage and presentation views, so that these can travel with the entity class. We can envisage this working at the atomic or data item level of system specification. How to handle larger classes is an open issue.

Footnotes

On domains in enterprise applications

James Odell (1994) says: ‘Existing entity modelling techniques have deficiencies for object-oriented analysis. To assist the object-oriented designer, the object-oriented analyst must specify all object types and associations clearly.’ By ‘all’, he means to nag the analysts into adding the basic data types and domains as classes into their entity models. But they don’t want to do this!

Analysts don’t usually have too worry much about domains. True, specifying the individual variables in a system specification is important, often difficult and always time-consuming. But specifying the domain of each variable is not that difficult. It can be postponed until near the end of design.

Every database builder knows they must define the domain of each variable. Most are happy to leave it until the analyst has completed something like a relational entity model. You need to get an overview of the system structure before defining its details.

When it comes to specifying the domain classes in enterprise application, a simple two-level structure, generic and applicaion-specific, is usually enough.

Generic domain classes

At the bottom level of specification you may use a few generic domain classes, such as text, number and date formats. These are only a minor concern; you probably need only the half a dozen or so data types provided by your chosen implementation technology.

Application-specific domains

At the next higher level above generic domain classes, there may be tens or hundreds of application-specific domains, such as ‘Phone-Num’ and ‘Country’. There are two ways to specify these system specific domains, at the bottom and at the top of the specification.

You may specify the domain in a dictionary, to be reused in defining the attributes of classes. You might define a dictionary entry Phone-Num as being of the generic domain class Number and always beginning with 0, then specify the classes Supplier and Customers as having attributes called Supplier-Telephone and Customer-Telephone, both with the properties of the application-specific domain called Phone Num.

Note that naming conventions must be agreed. If you name two or more attributes directly after their domain, then wherever the context is ambiguous, say in an input message, you will need to declare the context somehow, say Supplier/Phone-Num or perhaps Phone-Num (of Customer). For this reason, people tend to compose an attribute name by combining the class name and domain name.

Or else you can turn the specification on its head by declaring the domain as a high-level master class (say Country) in the entity model, and then specify different classes owning this attribute (say Customer and Supplier) as detail instances of the master class. Thus, you can invert any attribute variable, or rather its domain, to become a class whose key is one valid value of the domain.

An age-old question of database design is: should we do this? Should we show the application-specific domains as classes in an entity model? We addressed this question in ‘Introduction to rules and patterns’.

On classes as state machines

Jackson (1975) showed how to decompose a system into components by resolving ‘structure clashes’. The resulting system is a set of co-operating state machines, where each process has an input data structure, a state vector and a state variable. Let us bring Jackson up to date with object-oriented by adding object-oriented terms in square brackets to Jackson’s original remarks.

The input data structure of a class

Jackson considered each object as consuming the stream of events that update it. In object-oriented terms, this stream of events is a stream of method invocations. Each event invokes a method of the object. We name each method after the event that invokes it, or after a superevent where more than one event can trigger the method.

The state vector of a class

Jackson grouped the private variables of a class in its state-vector. He said: ‘the contents of the state-vector are private to the [object]: they are truly “own variables” in the sense that no other [object] should be able to inspect or change them, or to take cognizance of their formats and values’. In object-oriented terms, this is encapsulation.

To resolve an ‘interleaving clash’, you have to separate the ‘multi-threaded’ process from its the state vector. You keep one copy of the process (one for the class), but many copies of the state-vector (one for each object). Thus, Jackson promoted the idea that a database is nothing more or less than a place to hold the state-vectors of concurrent objects. In object-oriented terms, this is resolving a concurrency problem.

The state variable of a class

Jackson said: ‘the [object] has only one linear text and one location counter; the current place in the text must correspond to the current place in the data structure [stream of method invocations] being processed’.

In object-oriented terms, the importance of the object’s location counter or state variable is that you can use it in evaluating rules. If an event finds an object in the wrong state, then that event must fail. Following our application modelling techniques, rules are coded in the form of event preconditions.

Object-based versus object-oriented

Jackson’s view of object classes is sometimes called ‘object-based’ rather than object-oriented, meaning that it does not incorporate the idea of class hierarchies or inheritance. Berrisford and Burrows [1994] showed this to be untrue.

Class hierarchies and aggregates

Fig. 5k shows School as an aggregate in which the basic aspect of the class has parallel aspects that are mutually exclusive. The relationships from the basic aspect to the parallel aspects are crossed by an exclusion arc. The entity model then looks like a class hierarchy, but the lines between boxes are ‘association’ relationships rather than ‘is a’ relationships.

Fig. 5k

Head Teacher and Principal Governor are disjoint ‘aspects’ rather than disjoint ‘subclasses’. Is this merely sophistry? Does it makes a difference to the state machines?

Berrisford and Burrows (1994) showed that to express the disjointness of subclasses in state machines, you have to draw the state machine of each subclass as an option within a state machine of the superclass. So the life of a subclass may be completely rolled up into the life of its superclass.

At first sight (we haven’t explored enough examples yet), it seems like the same principle applies to an aggregate of disjoint aspects as to a class hierarchy of disjoint subclasses. If so, then we would say the distinction is sophistry.

Glossary

Aggregation: grouping the properties of subclasses or aspects into one high-level superclass or aggregate class.

Aspect: an independent role of a class, a group of attributes whose updating is constrained by one state variable, a class whose behaviour is representable as a single finite-state machine.

Class: a set of object instances that share the same properties.

Class hierarchy: a structure dividing a superclass into subclasses.

Delegation *: dividing the properties of a class between a high-level class and low-level classes connected to it.

Event: an atomic transaction, a minimum unit of consistent change, transient but leaving a mark on persistent objects.

Event model: a specification that shows how one or more objects are affected by a single event, and the constraints that must be tested.

Object: something that persists and must be remembered.

Partition *: dividing the properties of one class between smaller roles or aspects.

Pseudo-inheritance: copying the properties of a superclass into its subclasses.

* The difference between Partition of parallel aspects and Delegation of parallel aspects is not obvious. Both mean separating a class into parallel aspects. But Delegation implies one of the parallel aspects is appointed as the ‘basic aspect’ that is at a higher level and owns all the others. You can think of the ‘basic aspect’ as the creator and destroyer of the object identity, simultaneously creating and destroying all related aspects.

Design issues

This chapter discusses design issues and tradeoffs. It shows how the separating the application and Data services layers of 3-tier architecture can help you to hide data replication and aggregation from the Business services layer of code, and minimise data migration difficulties.

Data replication and derivation

Redundant data makes one object dependent on another, so if you update one object, you are obliged to update another at the same time. But redundant data is not necessarily a bad thing. You have to consider a design tradeoff, and the kind of redundancy that is involved.

Tradeoff: enquiry process v. update process

The speed of a process is largely determined by the number of discrete objects it accesses. Reducing the objects accessed on update may increase the objects accessed on enquiry, and vice-versa, so you cannot optimise both updates and enquiries.

Similarly, the simplicity of a process is largely determined by the number of classes it accesses. Reducing the classes accessed on update may increase the classes accessed on enquiry, and vice-versa, so you cannot simplify both updates and enquiries.

An aim of relational data analysis is to simplify programming, to prevent programmers from writing unnecessary code. It achieves this by reducing data replication. E.g. you would normalise the Sale class on the left below, specify Stock as a separate class and assign Stock Description as an attribute of Stock rather than Sale.

Fig. 6a

Thus, you prevent programmers from having to locate and update all the Sales of a Stock, in order to update a Stock Description. This has the added benefit of reducing the danger of inconsistent Stock Descriptions being stored, through the update process not being completed properly (whether this is a failure of the programmer or the technology).

People sometimes teach relational data analysis as though its aim is to remove all redundant data. First, this is a means not an end. Second, there are two kinds of redundant data - replicated data and derived data - and they have different implications.

Replicated data

Replicated data occurs where one piece of information is repeated. This is not necessarily a bad thing. You may choose to replicate data to speed up or simplify enquiry processes. Typically, you might repeat an attribute of a master object in every one of its detail objects.

E.g. you might store Stock Description as an attribute of Sale, replicated in all Sales of a Stock. This will speed up any enquiry on a Sale that would otherwise have to access the Stock object for the Stock description. And if the Stock object is stored in a different database from the Sale, it will increase the cohesiveness and robustness of local processing.

If you do replicate data, it is wise to maintain the original data as well as its copies. So you should maintain Stock as well as Sale. We’ll come back to this under ‘Distribution’.

Derived data

Derived data occurs where several pieces of information are summarised in one place as the result of a calculation or procedure. The usual example is a total stored in a master object of detail objects. E.g. you might store a summary total of Sales in a Stock object, to save adding up this total on each enquiry.

Another kind of derived data is a derivable sorting class. E.g. ‘Customer Interest in Stock’ is a derivable sorting class that clusters all the Sales for a given combination of Customer and Stock.

Fig. 6b

Removing derived data can frustrate the aim to simplify programming. Suppose you omit the sorting class from the data structure. Programmers will have to sort Sales by Customer within Stock, or Stock within Customer, every time they want to display them in a structured list. In effect, they manufacture a ‘soft’ instance of the sorting class every time they need one.

In general, analysts can make extra work for programmers. Missing derivable classes and relationships from the data structure can make programming unnecessarily complex. Programmers end up defining the missing classes and relationships in program code. And they may have to do this lots of times, in many different programs.

Given the tradeoff between defining a ‘hard class’ in the structure of the persistent data, and defining a ‘soft class’ in one or more transient processes that operate on the data structure, the balance lies in favour of the former. As a rule:

Specify classes and relationships in the data structure, rather than leave them to be constructed by programs.

Benefits: simpler enquiry programming and easier program maintenance. True, whenever a class is amended you will have to amend all the programs which refer to that the class, but this is the case whether the class is hard or soft. And there will simply be less program code to maintain if the class is a hard one.

Costs: some extra update processing, extra data structure maintenance and data migration costs. If you map every entity class onto a database table, then you will have to ‘migrate’ persistent data from one structure to another whenever a class is amended.

Benefits without costs?

How to get the benefit of an application-specific entity model that makes application programming easy, while at the same time using a data storage structure that speeds up enquiry processes, increases the robustness of distributed operations, and facilitates maintenance without data migration? The 3-tier architecture opens up the interesting possibility that you might define different data structures for:

• Business services layer - entity state machine model designed to simplify processing

• Data services layer - data storage structure designed for performance and flexibility.

Most database designers reproduce the ‘logical’ entity state machine model as closely as possible in the data storage structure. This is how most systems are built. But you might take a very different approach in designing a large enterprise application. You can write application programs to operate on The entity state machine model , while storing instance data in a differently-structured data storage structure.

Replicated data belongs in the Data services layer

We propose that replicated data belongs in the Data services layer, not in the Business services layer

The idea is that you can specify and code the Business services layer as though no data is replicated, hiding all replication in the Data services layer.

E.g. the application program that updates a Stock Description will assume it is stored only in a Stock object; it will call the data abstraction layer; this will find all the places where the Stock description has been replicated and update all of them. So the application program is entirely unaware of how far data has been replicated. The data abstraction layer handles the extra complexity. You may even be able to a buy a distributed database management system that does the job of the data abstraction layer for you.

Derived data belongs in the Business services layer

Perverse though it may seem, we propose that derived data belongs in the Business services layer, not in the Data services layer.

It is nonsensical to hide derived data in the Data services layer only. It would be foolish to code an enquiry in the Business services layer to report the total Sales of a Stock by adding up the total, if the total has already been calculated and stored in the database. Likewise, it would be foolish to code complex enquiry processes in the Business services layer as though a sorting class does not exist, if it does exist in the Data services layer.

Doing it the other way around is far more reasonable. You can specify and code simple enquiry processes in the Business services layer as though a derivable total or sorting object has been stored. You may then choose to store the derived object in the Data services layer, or else the data abstraction layer can derive it and present it to the Business services layer whenever it is required.

Conclusions: Don’t store redundant data until you have established a clear business case in terms of speeding up enquiries or increasing the robustness of local operation. Specify ‘replicated data’ in the Data services layer of code. Specify ‘derived data’ in the Business services layer of code. Yes, this does mean the conceptual model of the Business services layer is influenced by physical design considerations, but the alternative is ludicrous.

Data distribution

A single central database is the simplest option from the design point of view. The motivation for distributing subsets of a database around the nodes of a network is to enhance the performance or robustness of local processing at a node. This may involve replicating data at different locations.

If you define different data structures for the Business services layer and Data services layer, then you can hide all data distribution decisions and complications in the Data services layer. You may the annotate the data storage structure with distribution details, leaving The entity state machine model untouched.

When it comes to distributing objects, the classes in the data storage structure might be divided into three kinds.

Objects that sit naturally at location

Locations where a business wants to store data often appear in the model as classes (department, warehouse, local office, or whatever). The natural scheme is store an object of a such a class at its real-world business location. Some details fall naturally under these locations.

Fig. 6c

Not every object is naturally related to only one location. A customers may be the recipient of Sales from several Warehouses. You might begin by assuming that all multi-location objects objects are stored at a central server location.

Detail objects that link objects in different locations

You might choose to store a Sale at the location of either Customer or Stock. The notation in Fig. 6d suggests a Sale is stored with its Stock.

Fig. 6d

Or you might choose to store Sales in a distinct storage location, separate from both Customer or Stock. Either way, distributed locations are connected along a one-to-many relationship. Managing a one-to-many association between distributed objects can be difficult. So you might instead choose to replicate a Sale in both locations, and connect the two Sales together.

Fig. 6e

This has the advantage of connecting locations along a one-to-one relationship, which is simpler to manage. Of course, if there are further detail classes connected to a Sale, you now have to decide where they are stored, and perhaps duplicate them as well.

Master objects that are used in several business locations

Some master objects (like ‘Currency Conversion Rate’ or ‘Customer’ in our example) can appear at several business locations. You might choose to store these objects only once, in a central server or head office storage location. The problem is that local processing may be too slow, or if the network goes down, people cannot carry on working on their local office database. A way around this is to unnormalise data and copy the master object into several locations, so user have all the information they need close at hand.

You might repeat the Customer name in every one of their Sales records. Or more openly, you might copy each Customer object in all business locations.

Fig. 6f

You should not eliminate the original master object. One object (in a master location or a distinct server location) has to keep track of all the places where the object has been copied, for the purpose of broadcasting updates.

So in short, distribution means you may have to:

· select one business location for objects that naturally relate to more than one business location

· define distinct storage locations other than natural business locations

· divide one class into two parallel aspects connected by a one-to-one relationship

· divide one object into one master object owning many copies.

Tradeoff: robustness v. inconsistency

Where a single database is partitioned and stored at several locations, the issue of robustness arises. If the network fails, you want to carry on working at one database location while not connected to the others.

To increase robustness, you will tend to replicate data at different locations. But this means that there is the danger of data in different locations getting out of step, whether due to sloppy design by or failure of the network technology. What if somebody updates, or worse deletes, a Customer object on one of the databases while the network is down? The various databases will get out of step.

Getting the databases back in step can take a great deal of effort. It is not just a question of running automatic update programs. While the network is down, you might accept Orders at a Warehouse for a Customer that has been deleted or black-listed at head office.

When you find out later that the Customer has been black-listed: Should you now reject these Orders? Or should you find some other Customer to take them? These are questions that the business analyst must address rather than the database designer.

Data migration

Data replication and data distribution are two good reasons to design a data structure for the Data services layer that is different from the data structure of the Business services layer. Data migration may be another reason.

Programs are transient. Data is persistent. So changing a data storage structure involves an extra step, called ‘data migration’, that changing a program does not. You have to reorganise already-stored data, shifting it from one version of the data storage structure to the next.

The more you specify application-specific classes and relationships in the data storage structure, the greater the data migration cost whenever these classes or relationships change.

This is not necessarily a bad thing. Remember, the rule to specify classes and relationships in the data structure rather than leave them to be constructed by programs. What the database designer misses out, the programmers will have to put in, tenfold. And if data migration is needed because you are correcting a poor data storage structure, inserting classes or relationships you overlooked, then you have only yourself to blame.

Conclusion: expect data migration and include it your plans.

Nevertheless, there are some very large databases where data migration is just too expensive. Is there an alternative design for maintenance strategy that will reduce or eliminate data migration?

Avoiding the cost of data migration

Can you have it both ways? Can you have both the specificity of The entity state machine model , and the flexibility of a data storage structure that does not require amendment when The entity state machine model is altered?

Again, yes you can. You can write application programs for the classes in The entity state machine model , and store instance data in different and more generic structure in the data storage structure.

Fig. 6g shows an extreme example. The structure on the right is generalised so far that no conceivable application amendment would require it to change.

Fig. 6g

How does this work? You code the entity classes and relationships in the Business services layer. You code the data storage structure in the Data services layer. You design an data abstraction layer to translate between the entity classes and relationships and the data storage classes and relationships.

When your application program wants the instance data of a specific Customer, it does not read the data storage structure but calls the data abstraction layer. How the data abstraction layer assembles the Customer object from the data in the data storage structure is a matter only for the data abstraction layer.

When The entity state machine model is altered, you have to amend the application programs, you have to amend the data abstraction layer, but you do not have to restructure the data storage structure or carry out a data migration exercise.

Conclusion: where it is justified (by data migration or performance costs) introduce an data abstraction layer to separate The entity state machine model from the data storage structure (designed for flexibility and performance).

A few more tradeoffs

The art of system design is to find the best balance between conflicting objectives. Many authors have listed general objectives for system design. Some have suggested ways of measuring how far these objectives are achieved. Relatively few have focussed on the tradeoffs between objectives.

The optimum balance between conflicting objectives will differ from system to system. We have been making generalisations about tradeoffs in the kind of system we are most interested in - enterprise applications. Here are some more tradeoffs to finish with.

Efficiency: size v. speed

You might reduce the amount of code in a monolithic program by removing a repeated block of code into a reusable subroutine. But this will tend to slow the program down.

You might provide a faster alternative algorithm for a given process. For example, you might design a faster text printing algorithm that produces only rough or draft quality print. But this will increase the amount of code in the system.

Object-oriented programmers often do provide alternative algorithms for a single process. The substitution of one algorithm by another is recognised by Gamma et al in the form of a design pattern called ‘Template’. The substitution of one step in an algorithm by another is recognised by Gamma et al in the form of a design pattern called ‘Strategy’.

Yet in the Business services layer of an enterprise application, you virtually never provide alternative algorithms for one process. In fact, it is not worth worrying about processing speed at all. The speed of an enterprise application is completely dominated by the time taken to store and retrieve data. Efficiency lies in the hands of the database designer.

In speeding up data access, a database designer will tend to increase the backing store needed to hold the database. The designer will allow more space for a data group to fit on the page of the database it is placed on, so it doesn’t overflow that page. The designer will allow more space for storing relationships, space for extra pointers and extra indexes.

Conclusion: buy much more database space than you think you will need.

Database accessibility: crude locking v. concurrent usage

While it is running, a database update process has to lock the entities it is working on so that no other process can alter them. A crude locking mechanism will lock the whole database, or a large area of it. The ideal locking mechanism will lock only the objects actually updated by the process

If there are many concurrent users of the system, a crude locking mechanism can dramatically degrade the system’s performance. To speed up the system, you will need a more sophisticated locking mechanism that works at a lower level of granularity.

Conclusion: refine the locking mechanism in proportion to the number of concurrent users.

Database enquiry speed: aggregation v. flexibility

To speed up a specific enquiry or display you may store all the data you want for that enquiry in one large object. The price you pay is inflexibility and disoptimisation from another enquiry perspective.

For example, if you store all of a Customer’s Orders within the Customer object, then you can easily and swiftly assemble the list of Customer’s Orders for display.

You might call this an aggregate entity state record, or an unnormalised object. Calling it a ‘real-world’ object is nonsense. An aggregate entity state record is no more a real-world object than a third normal form relation is a real-world object, it’s just data storage that’s optimised from one perspective, usually for output display.

Such optimisation makes the system less flexible, less suited to processing from another perspective. For example, you cannot so easily list all the Orders placed for a specific Stock Type.

(By the way, some of the things people say about how much better an object-oriented database is than a relational database are the same things network database designers have been doing for twenty years to optimise performance. To speed up access - store pointers to the detail objects along with the master object. To save space - roll up detail objects into one or other master object, making an aggregate entity state record. These are matters for the Data services layer, nothing to do with defining the Business services layer.)

Conclusion: don’t unnormalise stored data into an aggregate entity state record until you have established a clear business case in terms of enquiry speed, and define aggregate tables in the data storage structure rather than in the entity state machine model.

Cost of usage v. cost of design

Making the users work at the user interface easier takes more design effort. Conclusion: spend money on usability in proportion to the number of end-users who will benefit from your design efforts.

Breadth v. focus

Users want a system that does the job, no more, and operates efficiently. If you give users more than they ask for, you may end up obscuring the main functions behind features people never use, making the system harder to use, and slowing it down.

(Perhaps you discovered this from a user’s perspective when you last upgraded your word processor to the latest version.)

Worse, features that are never used tend to fall into a state of disrepair and decay. Since nobody cares about them, you can be pretty sure that they won’t work very well if somebody wants to use them in the future.

Conclusion: don’t implement more features than you are asked to, but don’t let this stop you thinking ahead and designing for maintenance.

Complexity: component size v. component interaction

Designing a large component or module takes a long time. A large component is harder to understand, test and maintain. Most people recommend you decompose a system into small self-contained components. Indeed, this is a mantra of object-orientation.

The trouble with replacing a large component by smaller ones is that they must talk to each other. There is more interaction between components than before. You have to concentrate more on the interfaces between components. Message-passing becomes more of a design issue. You replace one kind of complexity (proportional to component size), by another kind of complexity (proportional to component interactions).

(There is a more obscure difficulty with defining many small object-oriented components or classes. Where not all the effects of one event type appear in one class, you may have to add an extra ‘gatekeeper’ class to sit on the path of an event type, whose only job is to decide whether to let an event instance through to a related object or not.)

Conclusion: when you partition a system into smaller classes, expect to increase the effort you apply to Event Modelling.

Summary

We’ve discussed design issues and tradeoffs. We’ve shown how the 3-tier architecture can be used to minimise data migration costs, and hide data replication and aggregation from the Business services layer of code.

In summarising the conclusions of this chapter, we can list a dozen design or so principles for large systems.

• specify classes and relationships in the data structure rather than leave them to be constructed by programmers

• where it is justified (by data migration or performance costs) introduce an data abstraction layer to separate the entity model from the data storage structure (designed for flexibility and performance)

• don’t store redundant data until you have established a clear business case in terms of speeding up enquiries or increasing the robustness of local operation

• specify ‘replicated data’ in the data services layer of code

• specify ‘derived data’ in the business services layer of code

• expect data migration and include it your plans

• buy much more database space than you think you will need

• refine the locking mechanism in proportion to the number of concurrent users

• don’t unnormalise stored data into an aggregate table until you have established a clear business case in terms of enquiry speed

• define aggregate tables in the data storage structure rather than the entity model

• spend money on usability in proportion to the number of end-users who will benefit from your design efforts

• don’t implement more features than you are asked to, but don’t let this stop you thinking ahead and designing for maintenance

• when you partition a system into smaller classes, expect to increase the effort you apply to Event Modelling.

From design patterns to analysis patterns

Analysts need what might be called ‘analysis patterns’. These will be similar to design patterns for object-oriented programming in some ways, but different in other ways. This chapter focuses on a pattern they call State. The footnotes mention also Composition, Decorator, Facade, Adapter, Bridge and Proxy.

What analysts need from patterns

Design patterns have been developed by and for object-oriented programmers. The usual reference is ‘Design Patterns: Elements of Reusable Object-Oriented Software’ by Gamma et al.

Gamma et al. are widely and affectionately known as the Gang of Four. Their work is rightly acclaimed; it is an example to those teaching analysis of how to teach expertise (not just notations) via patterns.

Are design patterns relevant to Analysts and designers?

Analysts need patterns for processing persistent data

Most object-oriented designers work on systems that process transient objects; for example, compilers, graphical interfaces and financial modelling systems. So naturally, design patterns are mainly concerned with transient objects.

The data in a business database is composed of entity state records that represent real-world entities, long-lived entities that the business seeks to monitor and perhaps control. So analysis patterns must apply to persistent entities.

Fig. a repeats from chapter 1 a scale from transient objects to persistent objects. This is very closely related to the scale from type to state. The longer objects persist, the more that apparently fixed types become variable attributes or transient states.

Fig. a

It turns out that the persistence of data has a big influence on patterns for software design, as you shall see. You need a theory for how to manage states as well as types. Traditionally, different modelling theories have been applied to modelling types and states. One of our aims is to combine these theories.

Analysts need patterns that prompt questions

The Gang of Four say ‘Design patterns solve many of the day-to-day problems object-oriented designers face.’ Each design pattern fits to a given problem. You use a design pattern to solve a problem you already know you have.

Analysts need help with analysis, to discover what the problem is. A analysis pattern should help analysts to ask questions and find things out. It should help you to test and uncover problems in an existing specification. The most cost-effective training involves teaching bad patterns as well as good ones.

Analysts need patterns to do with real-world objects

Design patterns help designers to sort out computer-world objects. Analysts need to sort what things in the real world have to be represented in the system. Analysis patterns must help analysts to investigate the rules and practices of an enterprise in the real world, the one that is to be supported by the enterprise application.

Analysis patterns must be concerned with eternal verities in the way that real people and real businesses behave. At least, those eternal verities that can be captured in a ‘conceptual model’ of business objects and coded in the ‘business services layer’ of a system. Analysis patterns will be used mostly in defining the business services layer rather than the UI layer.

Analysts need patterns that are logical

Design patterns are expressed in physical terms, in terms of implementation mechanisms, and more specifically in terms of object-oriented programming mechanisms.

Analysis patterns should be expressed in logical terms. They must define characteristics of the problem domain rather than the implementation domain. Analysts should be able to use them without knowing what technology will be used to implement their design, be it C++, Java, COBOL or ORACLE.

For example, OO-style class diagrams specify where objects hold references to other objects. Fig. b shows two class diagrams on the left that are implementations of the same logical entity model on the right.

Fig. b

The logical notation above for modelling the cardinality of a relationship between classes is well known. See the chapter ‘Rules and relationships’ in Analysis patterns.

Inheritance and polymorphism in design patterns

Since the Gang of Four say ‘Almost all the [design patterns] use inheritance to some extent’ let us begin by reviewing the idea of inheritance. A class hierarchy or inheritance tree is a structure composed of superclasses and subclasses, wherein a subclass can inherit or override the properties of a superclass above it in the hierarchy.

object-oriented technologies help you achieve reuse by applying inheritance and polymorphism to a class hierarchy. See the chapter ‘Class hierarchies and aggregates’ for more about inheritance and polymorphism.

The general shape of a design pattern

Many of the Gang of Four’s design patterns are rather similar, based on a common template involving an abstract class, shown in Fig. c.

Fig. c

The ideas of patterns like this is to separate the interface of an object or a process from various possible implementations of it. Thus, design patterns of this shape capture expert knowledge about good uses for polymorphism and abstract classes.

The Gang of Four again: ‘When inheritance is used carefully (some will say properly), all classes derived from an abstract class will share its interface. All subclasses will be subtypes of the abstract class.’ See for example their design patterns: Iterator, Observer and Abstract Factory.

Analysts need few patterns that feature class hierarchies

There may be a few over-enthusiastic object-oriented designers who believe that good design means explicitly spelling out all the class hierarchies you can find in the entity model of a system.

Fig. d

Class hierarchies are common in some kinds of software design. But chapters 5 and 6 have explained why you are unlikely to find so many in the persistent data structures of enterprise applications. Even the Gang of Four say ‘Designers overuse inheritance. Designs are often made more reusable and simpler by depending more on object composition.’

Good analysts do not specify many class hierarchies in the entity model that specifies the persistent data structure of an enterprise application. Where the list of subclasses is very long, or variable, or there are complex overlapping hierarchies; then defining class hierarchies creates schema evolution problems.

Since analysts normally specify class hierarchies in other ways and places, few analysis patterns will involve inheritance, and very few will feature polymorphism, at least, not in the way that object-oriented designers think of these things.

You might suppose then that analysts will find little use for design patterns. But it turns out you can identify where some design patterns apply to enterprise application design. And you can reshape some design patterns into analysis patterns. We go on to reshape one design pattern for use by analysts, replacing the class hierarchy with classes connected by one-to-many relationships.

The State design pattern for object-oriented designers

Design patterns might be divided into three groups:

• not very useful in enterprise applications

• useful to analysts in the business services layer

• useful to designers in the others layers or the interface between layers.

The second group is the most interesting. The Gang of Four define a pattern called State that is designed to ‘Allow an object to alter its behaviour when its internal state changes. The object will appear to change class.’ Let us look at how this design pattern can be reshaped for analysts.

Our tiny case study features one object class, Person, and two event classes, Employment and Death. The Employment event can only happen if the Person is unemployed. The Death event has two effects depending on whether the Person is employed or unemployed. Let us say Death (employed) goes on to affect Employer.

Fig. e shows the State design pattern in the entity model. Person and Person-Employment-Status are parallel associated objects. There is a class hierarchy under Person-Employment-Status of subclasses Employed and Unemployed.

Fig. e

Messages to Person are delegated (by the implementations of the methods defined in its interface) to Person-Employment-Status where appropriate, i.e. where the response depends on the state.

You can use the State design pattern to implement one event that has different effects on an object in different states. You code each event effect as a distinct (polymorphic) method in a subclass of the status object. The status object divides the event between event effects.

Fig. f illustrates that you code the Death event in the Employed class and Unemployed class as two distinct methods. Personal-Employment-Status passes the Death event down to the appropriate subclass.

Fig. f

You have to code the selection between subclasses somewhere - in a data structure or a process structure. If you code it in the data structure, then an object-oriented programming environment can make the selection between subclasses ‘under the covers’ in any process that hits the status object. So you don’t have to make the selection between types explicit in any process.

The State design pattern as a way to avoid selections in processes?

You might use the State design pattern as a device to avoid coding a selection or case statement within a method. You place the case statement in the data structure and code each option as a distinct method in a distinct class.

If the aim is to make code more maintainable, beware. First, what you gain in one way you lose in another; it becomes harder to see which methods are in fact related by mutual exclusion when an event is processed. Second, where data persists, it is easier to change the structure of a transient process than the structure of persistent data.

In enterprise applications, it is not reasonable or practical to remove all case statements from methods. It is like trying to define all constraints as state-transitions in state machines. This way of thinking, of trying to design everything using only one tool, is a trap to be avoided.

Parallel classes

There is one element of the State design pattern that is not so helpful to analysts - the class hierarchy showing each state as a subclass under each parallel aspect. Given that fixed class hierarchies do not abound in the data structures of enterprise applications, inheritance and polymorphism cannot be so useful as you might hope, and design patterns have to be reshaped for this kind of system.

However, there is another element of the State design pattern that analysts can use. We have argued from around about 1980, and most recently in the Computer Journal (1994), that a class is best divided into parallel aspects along the lines of its need to maintain state variables.

A state variable is an attribute with a short range of values that is tested as part of the precondition for one or more events. E.g. if a Person’s Employment Status = employed, then an Employment event cannot happen. And if a Person’s Employment Status = unemployed, then a Redundancy event cannot happen.

Ask of a class: Does it maintain a state variable? If yes, create a parallel class to maintain it. Motivations include: keeping each class smaller and easier to comprehend on its own; suiting the paradigm of object-oriented programming; and tightly encapsulating the maintenance of a state variable.

This last means that the state machine for each class can be described elegantly using a regular expression notation, and this has further advantages in pattern recognition.

Where a class maintains several state variables, you should appoint a ‘basic aspect’ that is the master of all parallel aspects. Fig. g shows the basic class as the master of all the parallel status classes.

Fig. g

Fig. h shows a possible example. It has three cyclical states, each varying independently. There is a ‘boundary clash’ between the cycles.

Fig. h

The basic class is responsible for maintaining object identity, and any attributes that can change in an unconstrained way as long as the object exists. The basic class so trivial it requires no state variable and there is little value in modelling its behaviour in the form of a state machine; it would be simply a sequence of creation, random updates, then deletion.

Some simple enterprise applications are composed of classes with only basic aspects.

Rolling up parallel aspects

In general, you should create a parallel class for each state variable that has to be maintained. But Fig. i shows that in simple cases, you might roll up one of the parallel aspects into the basic class. You don’t have to do this, but it is a harmless way to condense the specification and code in simple cases.

Fig. i

We have been talking about the specifying the business services layer of a system. You need not separate parallel classes in the data services layer. You can easily roll up all parallel aspects into one database table. One benefit: this speeds up performance, since each process will have fewer data objects to retrieve and restore. One cost: it makes the interface between the business services and data services layers more complex.

The State design pattern reshaped for analysts

Applying the pattern in section 7.4 to the case study, you would specify a Person class that is careless of the state, and a Person-Employment-Status class that flip-flops between employed and unemployed. All the processing that depends on the state belongs in the Person-Employment-Status class.

Fig. j

You can specify the subclasses not in the data structure but in the process structure of a Death event. Fig. k shows you specify the effect of Death on Person Employment Status as a selection between options Death (unemployed) and Death (employed).

Fig. k

The event model is an abstract specification. When you come to code it, you might well code the selection between event effects as a case statement within the transient method for the Death event, rather than in the persistent data structure. This has advantages. In large enterprise applications, this will help to reduce schema evolution problems, since you can change the structure of a transient process more easily than the structure of persistent data.

Some variations on this theme are shown below.

Status cycle as a historical record

a cyclical state, do users want to remember the history of past cycles? If yes, you can introduce a one-to-many detail class.

Fig. l introduces a detail class called Job.

Fig. l

Status as an optional detail

Fig. m shows that if you don’t want to remember the history of past cycles, only the current one, you might remove the fork from the relationship in Fig. l.

Fig. m

You don’t normally see this shape however, because designers normally roll an optional aspect like this into its master class.

State variable as a domain class

Fig. n shows you might add a domain class for the state variable attribute, called Employment Status.

Fig. n

It is helpful to distinguish domain classes defined in the business services layer (under end-user control) from classes defined in the UI or data services layer (under designer control).

If designers want to define the values of a state variable in some kind of table, perhaps along with an expanded description of the state that is useful in error messages, you should define the domain class for the state variable in either the UI or data services layer.

If users want to be able to change the description of a state (‘unemployed’ to ‘redundant’), you may define the state variable as a state class in the business services layer. But be careful not to expose the class’s specification too far to manipulation by users; you surely don’t want users creating or deleting states, and thus changing the rules of the application.

Domain classes are discussed further in other volumes in this series.

Recursive composition design pattern

Composition defines an abstract class that provides a common interface for every level of a hierarchical structure. It specifies the bottom ends of the hierarchy as a special case. Curiously, it does not specify the top end as a special case, though this is sometimes necessary.

Recursive composition is familiar to most database designers. When database designers specify fixed-depth recursion, a different pattern emerges, in which the top and bottom ends of the structure appear under parallel classes.

However, the recursive structures found in enterprise applications are normally of variable depth; three varieties are possible.

The volume ‘Patterns in entity modelling’ says more about such recursive patterns.

Recursive decoration

You can specify attributes as classes, then specify a new thing as a subclass of each relevant attribute class, using inheritance to obtain the attributes. But multiple inheritance can lead to complex structures, difficult to manage. You cannot make schema changes, alter the attributes of a class, without changing the data structure and losing the instance data. To avoid these problems you can use the Wrapper pattern to add properties to a basic thing, one on top of another.

You store new attributes as object instances without changing the data structure and thereby losing all the instance data. The price is that you hide the basic object beneath layers of attributes. When a wrapper is added, the object identifier appears to change.

Each wrapper completely encapsulates the original object and any previously created wrapper. Each wrapper has a different object identifier. The identity of the original object remains the same, but since an external client can only call the outermost wrapper, the identifier appears to be that of the last wrapper.

Client --> Wrapper3 --> Wrapper2 --> Wrapper1 --> Object

In fact, some calls are dealt with in the wrapper without forwarding, or are supplemented before forwarding. This is the way the wrapper is able to add extra functionality.

This kind of data structure is too inefficient for database designers. Both multiple inheritance and recursive decoration are devices for designer-maintained data rather than user-maintained data. Enterprise application designers don’t normally specify attributes in either of these ways. They use relational theory. They view objects as rows of a table, each row identified by a unique key. They view attributes as columns of the table. They may invert attributes to become key-only master classes at the top of one-to-many relationships as shown below.

You can make schema changes, add new attributes to a relation, without losing all the instance data; you can preserve the identity of objects stored so far. But you do have to recompile the data structure, and probably some of the programs, and retest the system.

Using design patterns to separate subsystems

The Gang of Four say ‘Each design pattern lets some aspect of the system vary independently of other aspects, thereby making the system more robust to a particular kind of change.’

Many design patterns are about decoupling servers from clients. They help you to separate concerns for ease of maintenance, to keep distinct subsystems apart yet also connect them.

Experts advise keeping the bridges between subsystems as narrow as possible, keeping interfaces simple and economical. This is very much the idea behind one of the Gang of Four’s design patterns called ‘Facade’. This and other design patterns can be useful in bridges between subsystems of the 3-tier architecture.

Below, we’ve slightly edited and rearranged a contribution by Patrick Logan to the patterns group on the internet, in which he suggests the application of other design patterns to the 3-tier architecture:

‘Constraints

‘Despite the variation of user interfaces and databases, the system as a whole must maintain its integrity (adherence to system requirements). The logic and the system integrity checks represent most of the new development required.

‘The user interface tier should interact with the user, but refer to the middle tier (business logic and integrity) for the computation. The middle tier should be implemented in terms of abstract objects, hiding the business logic from the user interface, and from the details of the databases.

‘You can separate the three tiers using the structural patterns described in Design Patterns, such as Adapter, Bridge and Proxy.’

Analysis patterns will be about coherence and constraint, apply within a coherent subsystem, within a layer of the 3-tier architecture, rather than between them. Analysis patterns must apply within the business services layer of code, help you to get the functionality of a system right.

Analysis patterns must help you integrate concerns, help you to specify the coupling between business entities, to tighten the constraints as far as possible, so that these objects remain consistent one with another.

So broadly, one might say: ‘Apply design patterns to loosen the interfaces between subsystems. Apply analysis patterns to discover and specify the constraints within a subsystem.’

More of what analysts need from patterns

We started by suggesting analysts need what we are calling analysis patterns. These will be similar to design patterns for object-oriented programming, but different in a number of specific ways. We’ve already suggested that analysts need:

• patterns for processing persistent data

• patterns that prompt questions

• patterns to do with real-world objects

• patterns that are logical

• few patterns that feature class hierarchies.

Both design and analysis patterns are concerned with smallish structures of relationships between elementary components of a system. Analysis patterns tend to be simpler than design patterns, more abstract in the sense of technology-independent, and they are perhaps more numerous.

There are a few more things to say about the use of patterns in analysis. Pointing up differences between design and analysis patterns sheds a new light on both fields of research.

Analysts need only a few patterns that feature recursion

Several of the published design patterns for object-oriented software construction feature recursive communication between instances of a class. The few analysis patterns that do feature recursion are interesting, but perhaps not so commonly used. See Footnotes.

Analysts need patterns that model business rules

Design patterns can help you build enterprise applications that are more robust in the face of changes, while analysis patterns will help you build enterprise applications that are correct in terms of applying constraints. Both can help you make the step from naive database use towards more complex database use. See Footnotes.

Analysts need patterns for object behaviour analysis

Design patterns appear in two dimensions of conceptual modelling - entity modelling and Event Modelling. Confusingly, the Gang of Four refer to patterns in object interactions as ‘behavioural’ patterns. We use the word ‘behaviour’ in a different dimension.

What we call the object behaviour analysis face of the conceptual modelling cube is to do with specifying the long-term behaviour of persistent objects in the form of life histories or state machines. There are many analysis patterns in state machines. This is an area in which analysis patterns work might contribute to design patterns work.

Analysts need patterns that suit database processing

Design patterns help with object-oriented programming technologies. Analysis patterns must help with systems that use database and transaction processing technologies.

But the distinction between technologies is not as fundamental as it looks. Analysis patterns may be implemented in object-oriented software. Design patterns may appear in enterprise applications.

Appendix A: very general principles

This book is largely practical. There are a few abstract principles that underlay the discussion of patterns in this book and its companion.

There is no silver bullet.

A system is composed of many small elementary things (objects, facts, types, states, events and rules) connected together in various ways. You have to get down to the bottom level. You have to define all the elementary things and the relationships between them, at a level of description that can be executed on a computer. There is no way to avoid this. There is no way to avoid the pain.

A system is composed of connnected things

Everything in a system must be connected to everything within that systems, otherwise there must be two or more distinct systems. Patterns are about connecting things. There are recognisable and reusable patterns in how things are related. Patterns that connect just two or three things are the most reusable, but patterns that connect four or five things are more valuable.

The cardinality of connections is fundamental

The ‘how manyness’ of things in relation to each other is a fundamental kind of rule that has to be specified in each view of a system. You define one-to-one, one-to-one-or-zero, and one-to-many relationships not only between classes (in an entity model), but also between the concurrent objects affected by an event at a moment in time (in an event model), and between the events that affect an object over a period of time (in a state machine).

Most connections are associations

Things are naturally related to each other by association. E.g. A shoulder is related to an arm. An arm is in turn related to a hand. A husband is related to a wife. A divorce must relate to a previous wedding.

Persistence undermines compositions

Longevity turns composition relationships into associations. A composition relationship is an association, but strengthened by the rule that all the related objects live the same length of time. You can try to relate things by saying one is composed of others. E.g. A hand is composed of a palm, four fingers and a thumb. A car is composed of an engine, chassis, body, etc.

This is OK over short time, but longevity turns composition relationships into loose associations. You might lose a finger from your hand, or replace the engine of your car by another. You would better say a car is associated with a number of parallel aspects, each of them potentially optional or replaceable.

Persistence undermines type classifications.

Longevity turns subtypes into states of parallel aspects. You can relate things by saying one is a subtype of another. E.g. a Man is a Human; a Woman is also a Human. Similarly, Leg, Arm, Wing, Flipper and Tentacle are all subtypes of Limb.

This is OK over short time, but longevity turns apparently fixed types into variables or states. Under some legal systems, a Human can change Sex. You would better say a Human is associated with a number of parallel aspects - Sex, Job, etc.

A caterpillar turns into a butterfly. Exactly when in evolutionary history the forelegs of a monkey became the arms of an ape is an interesting question. You would do better to say a Limb has a number of optional parallel roles - Supporter, Hanger, Swimmer, Flyer, etc.

There is more than one paradigm, more than one orientation

Events coordinate interacting objects. Object-orientation and event-orientation are not in competition. They are orthogonal views of the same phenomena; equally valid and useful views.

Interaction is different from, more fundamental than, communication

An event reflects a natural phenomenon. An event model specifies the interactions between concurrent objects in a formal way. An event model is a directed graph; the event travels along each relationship in a one-way direction. But an event model does not commit you to any statement about communication.

Messages are an implementation device. You may select between a number of viable message-passing strategies. You can choose to send messages along the paths specified in the event model, or another route. The interaction is more fundamental, more objective, than the messages that make it work.

Berrisford’s law of assymetry

Nature abhors perfect symmetry. Assymetry tends to assert itself. If you discover two perfectly symmetrical things, you will normally destroy the symmetry by placing one over the other, or by inventing a third thing and relating both to it.

Redundancy is to be avoided - normally

If you say my shoulder is related to my arm, which is in turn related to my hand, there is no need to say my shoulder is related to my hand - this is implied. But not all redundancy is bad, since introducing redundancy into one perspective may reduce redundancy in another.

Appendix B: On the SmallTalk paradigm

Meyer discusses three technical advantages of the SmallTalk paradigm. These benefits apply largely to designers working with visual programming environments rather than business databases, and more to programmers than to analysts.

Conceptual consistency from using a single object-oriented paradigm

Only having to think in one object-oriented dimension is great for the programmer, but teaching analysts that everything in a real-world enterprise is an object doesn’t help them. We should teach and encourage analysts to consider all dimensions of the problem they are studying. They need a framework that clearly separates the different parts and orthogonal views of the problem domain they have to analyse.

Manipulation of classes at run time

This is great for the programmer, and perhaps for iterative prototyping, but positively dangerous in full enterprise application development.

Soon after enterprise application is set live, analysts are faced with the need to change the database structure or the rules that guarantee data integrity while the running system retains its stored data.

Where run-time manipulation of rules is required, analysts should define the rules as attribute values of some kind of classification or rule entity type.

Where the business entity model is to change more fundamentally, beware that the stored data is a valuable company asset. The necessary reprogramming, retesting, retraining and data conversion are expensive. Analysts need help to tackle such ‘schema evolution’ in a strictly controlled and methodical way.

Use of class-level methods alongside instance-level methods

Meyer suggests that programmers may find this a mixed blessing. Part of the art is to keep levels of abstraction apart. Analysts have two orthogonal ways to separate levels of abstraction.

Instance from type

Analysts do separate type from instance in the business services layer by one-to-many relationships between persistent classes: say:

Road Type ---< Road ---< Road Use

Programmers may later introduce class-level ‘methods’ to process any event that cascades down these one-to-many relationships.

Class from metaclass

This is not a separation that Analysts worry about, but there is a sense in which the three-tier software specification architecture separates class from metaclass. It keeps apart:

• business services layer classes, such as Road and Road Use

• UI layer classes: such as Window and Button

• data services layer classes: such as Table and Commit Unit

Might one view the data services layer classes Table and Commit Unit as metaclasses representing business entity and business event?

Appendix D: Object-oriented analysis in the UK

A tribute to the late Keith Robinson.

It is almost certainly true that the longest continuous object-oriented research and development programme in the world was started by Keith Robinson in 1977 at Infotech. After Keith’s death in 1993, the development was carried forward by John Hall of Model Systems and I (Graham Berrisford) who now work for Seer Technologies.

1977 Keith published a paper in the Computer Journal proposing an object-oriented program design method for database systems (not called that of course). Keith started from Michael Jackson’s earlier suggestion that the variables and processes of each object type could and should be encapsulated in a discrete processing module. An additional idea was to use the state variable of an object in validation of updates to that object.

1979 I helped Keith develop his proposals into a 10-day course called 'Advanced System Design' based on three techniques:

• Relational data analysis: Keith taught this as a technique to decompose the required system inputs and outputs in what we might now call the UI layer, into entity types for behaviour analysis in what we might now call the business services or data services layer.

• Life history analysis: Keith taught this as a technique to discover the behaviour of each entity type and document it in a state machine diagram. He favoured using regular expressions as the notation and called them life history diagrams after Jackson I think.

• Object interaction diagrams: Keith invented and taught these to document how objects exchange messages in order to complete the processing of an event (one event may synchronously update several objects, and/or need to be validated against the states of several objects).

Keith’s three-dimensional approach to conceptual modelling is now the norm in modern development methods. But there was a lot more to his method than notations, and some of the ideas he taught to do with schema evolution are still ahead of the game.

By the way, many years before Yourdon abandoned data flow diagrams, Keith advised against top-down decomposition.

1980 Keith’s course disappeared when his employers went into liquidation. Not along after this, Keith helped John Hall to develop an analysis and design method for the UK government. SSADM version one was built on around database modelling techniques and incorporated object-based process analysis and design techniques.

Keith and John deemed object interaction diagrams impractical for use by database programmers, but included life histories as an analysis tool for discovering processes and business rules. They assumed it was obvious that each life history or state machine could be transformed into a discrete program module using Jackson’s technique of program inversion (more widely known then now).

Unfortunately, version two of SSADM was developed by people who did not understand that life histories were a program design technique. The ground that was lost was not recovered for some years. And many still believe to this day that the main program specification technique in SSADM is data flow diagrams!

1983 Keith invented 'effect correspondence diagrams' (hereafter ‘event models’) to replace object interaction diagrams. The former are simpler than the latter, but equally formal. They suppress the detail of message-passing (which might be done in various ways) but show the essential correspondence between ‘methods’ in different objects affected by one event. The most wonderful feature of the diagrams is that they transform equally well into either object-oriented or procedural code.

1986 I tested event models with Keith and John until all were confident they could be adopted by the UK government. We worked hard to develop rules for mechanically transforming the state machine view in the life histories into the object interaction view in the event models. Keith tested these transformations by developing a CASE tool.

At the same time, Keith and I also proposed separating the business services layer from the data services layer by means of a process-data interface (perhaps coded as SQL views), so you can generate code directly from the event models, careless of the database designer’s implementation decisions or the database management system.

All these proposals were adopted by the UK government for SSADM version 4 in 1989. But they are still not realised today in CASE tools as well as they should be.

1991 Keith worked out a way to detect and document reuse between events in state machine diagrams. The result is a network in which events invoke superevents, which may invoke other superevents and so on. This network can be generated by a CASE tool from the state machines.

Keith knew then that SSADM had all the armoury required to be an object-oriented method for database systems, save for two problems.

• To avoid the confusion that existed (and still exists) in object-oriented methods between UI layer objects and business services layer objects, designers needed to separate the layers of the 3-tier processing architecture.

• The representation of inheritance in state machines needed further research.

1993 Keith and I wrote the book 'Object-Oriented SSADM' (published after Keith’s death by Prentice Hall) mainly to establish two ideas: the importance of separating the layers of the 3-tier processing architecture, and the use of the superevent technique to maximise economy and reuse of code within the business services layer.

1994 I published a paper in the Computer Journal that showed how the benefits of inheritance (reuse and extendibility) can be achieved through modelling state machines for the 'parallel aspects' of a class.

1995 John Hall did most of the hard work necessary to test, demonstrate and establish the above ideas for adoption by SSADM version 4.2.

1997 This book has examined the practical application of inheritance and polymorphism in enterprise applications. The companion volume ‘Event modelling for enterprise applications’ introduces improvements in the teaching and usage of event models, e.g. to include constraint discovery and specification.

References

Ref. 1: “Software is not Hardware” in the Library at http://avancier.co.uk

Footnote 1: Creative Commons Attribution-No Derivative Works Licence 2.0

Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.co.uk” before the start and include this footnote at the end.

No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it. For more information about the licence, see http://creativecommons.org