Class hierarchies

This booklet is published under the terms of the licence summarized in footnote 1.

 

Use and abuse of generalization in an entity model

Given some features are shared by several objects, you can abstract these features into a class specification, and identify the objects as members of that class.

Given some features are shared by several classes, you can declare them as subclasses of a generalised superclass which owns the shared features. A class hierarchy shows entity types divided into subtypes.

It is all too easy to get carried away by generalisation. Questions to be addressed in this paper include:

·       How much generalisation is enough? Required? Desirable?

·       What is the difference between an enterprise model, a business rules model and a database structure?

·       What is the difference between a business class and a generic class?

·       When should we introduce generic classes into an entity model used for system specification?

Some of the conclusions may surprise you, because they are contrary to practices that are widely employed by data architects in the IT departments of large enterprises. The big conclusion is this:

Guideline: The same high-level abstractions or generic classes that are so useful to the enterprise entity modeler and analyst, often turn out, when it comes to the detailed design of a specific system, to be only minor optimizations.

Case study

The Financial Regulation Agency (FRA) requires a system to monitor how far Independent Financial Advisors influence the formation of contractual agreements between Financial Institutions and their Customers. The main business rules are:

An Account is a contractual agreement between a Financial Institution and a Customer. An Advice Contract is a contractual agreement between an Advisor and a Customer.

Some Accounts result from an Advice Contract. An Advisor may lead to a Customer to create several Accounts, under the relationship defined by one Advice Contract.

The database must record the name and address of each relevant Financial Institution, Advisor and Customer, and the start and end dates of relationships between them.

Finding business classes

It is good idea to start by naming the things the users want to monitor and perhaps control, and to use the terms the business people use. E.g.

·       Financial Institution

·       Account

·       Customer

·       Financial Advisor

·       Advice Contract

You might slowly be able to move the business towards using a new, more generic language. But it is not a good idea to start there. And you will still need to analyse and understand the specific concepts.

Finding business relationships

You go on to look for associations between pairs of classes and draw them as relationships. You can resolve any many-to-many relationship by naming the link class that is the reason for, or result of, the relationship. E.g.

·       An Account is a cross-reference between one Financial Institution and one Customer.

·       An Advice Contract is a cross-reference between one Independent Financial Advisor and one Customer.

“Interesting. I have often done this in explaining the idea of relationships to those unfamiliar with the idea.” Michael Zimmer

So, a first attempt at building an entity model might be:

           

Notice that the relationships specify constraints on associations between objects. One obvious constraint is that the system cannot relate objects in two classes that are not connected by a relationship in the model.

What makes a good business rules model?

A good business rules model has two features that are worth noting here.

Objects should be recognisable by users

A business rules model is built at the level users can discuss. Users should recognise the difference between the classes you define. In addition, users must be able to differentiate between objects within one class; and for this, they need a uniqueness constraint to be placed on some or all of the data attributes of the objects.

I loosely distinguish between 'firm' and 'fuzzy' classes.

'Firm classes' are composed of objects so fundamental to this or another enterprise that the objects must be labelled with a unique identifier for the business to succeed.

E.g. a laundry business attaches a numbered label to each item in a customer's pile of laundry; so it can reassemble the pile after washing. The business simply cannot succeed without these numbered labels.

E.g. a clearing system cannot work without unique codes to identify banks, their accounts and transactions on those accounts.

The best object identifiers are those that exist not only in a computer system but also in the real world. So, when a computer system is introduced, the object identifiers will be available to users on datan entry.

'Fuzzy classes' are composed of objects that cannot easily be identified. This means it is easy to create duplicate objects by mistake. Of course, a computer system can generate a unique number for each object in a class, but such system-generated keys are no help unless or until they become used in the business world.

E.g., consider the five classes in our model so far. There are four firm classes and one fuzzy class. The FRA provides unique reference numbers for Financial Institutions and Advisors, who in turn provide unique reference numbers for Accounts and Advice Contracts. But how do users identify Customers?

Fuzzy classes are a big headache. It is difficult to identify objects and prevent duplicates. But let us ignore the pain for now and move on.

“Even with what you call firm classes, it really is hard to identify objects and prevent duplicates. We have a whole operational unit to deal with clients of the government healthcare system, and there are still lots of data quality problems centred around identification of individuals.” Michael Zimmer

Enquiry requirements should be supported in an obvious way

Back to the requirements. How do designers list for a given Advisor, all the Accounts that have resulted from their Advice Contracts? A logical relationship is missing from the model. Adding the missing relationship, I arrive at this model:

           

Note that putting business rules into the database structure

·       constrains system update processing as it should be, and

·       simplifies enquiry processing, but

·       may make it harder to accommodate subsequent changes.

The access path for the enquiry is shown below.

               

“I think relationally, so I don't think of navigation so much as joins. For a join, there is no particular starting point for the path.” Michael Zimmer

Ah! Not thinking about the navigation routes or access paths of processes is one of the main reasons why people fail to specify the ‘right’ relationships in entity models.

Generic classes and relationships

The main topic of this paper is the introduction of generic classes and relationships into a model that is to be used for processing.

What is the difference between a business class and a generic class?

·       A business class should correspond to a set of objects that business people recognise as sharing similar features.

·       A generic class is a generalisation of two or more business classes, at a level higher than that talked about by business people.

Consider generalisation in the case study. The Financial Institution, Independent Financial Advisor and Customer classes share features, a name and address attributes. Similarly, the Account and Advice Contract classes share features, start and end date attributes.

Therefore, you might abstract two superclasses. You have to invent terms to name them, since you have moved above the level that users talk about. The names I choose are: Party and Inter-Party Contract. Now it becomes possible to build a much more abstract model of the business requirements.

           

The paper later in this series called <Enterprise entity models> suggests this may be OK as an enterprise model, or as a highly generalised database structure but:

·       What are the keys of these classes? The keys must be artificial because these superclasses are abstractions, fuzzy classes, and not things the business people talk about or recognise.

·       And how do we support enquiries? E.g., List for a given Advisor, all the Accounts that have resulted from their Advice Contracts? The required access path through the abstract model is obscure.

               

This is a poor specification of the enquiry. The problem is that the business rules are missing. If the rules are not in the diagram, then they must be in documentation behind the scenes.

A business rules model is not just a diagram of classes and relationships; it includes the detailed specification of each class and relationship, and all the business rules contained therein.

So where do you specify the various processing constraints on how Inter-Party Contracts are created between Parties? Just a few of the constraints to be specified are:

·       an Inter-Party Contract cannot associate classes not declared as subclasses of Party,

·       an Inter-Party Contract cannot associate Financial Institutions and Advisors,

·       the only way an Inter-Party Contract can be related to another Inter-Party Contract (by means of the recursive relationship) is where the parent is an Advice Contract and the child is an Account.

The companion volume, The Event Modeler, introduce several ways to specify constraints. The focus here is on using an entity model to do this.

Showing generic and specific in one model

The main problem caused by drawing a class hierarchy of super and subclasses is not to do with the classes, but to do with the relationships between them. Do you need both super and subclass relationships? Three possible models might be drawn:

Model with generic & specific relationships

           

This is unacceptable, because there is redundancy. The generic relationships say nothing that is not already said by the specific relationships.

Model without specific relationships

           

Diagrams like this are often drawn. The papers on <Enterprise models> suggest they may pass as enterprise models, but they make poor system specifications, because too many missing business rules are missing.

Model without generic relationships

The third model below shows specific relationships but not generic relationships.

           

This seems the best model of the three in terms of specifying business requirements. This brings us to the second conclusion that may surprise you. If you draw specific subclasses, you ought to draw specific relationships. This is not absolute rule, just a guideline.

Guideline: Where we take the trouble to specify subclasses in a business rules model, then we ought to specify the relationships down to the same level of specialisation.

MICHAEL: I think, in practice, that is what I would do.

DAVID: In your example, you divide Party into three subtypes: Financial Institution, Customer and Financial Advisor.

It is arguable whether a Financial Institution is inherently a Party, but Charlie the Customer and Sally the Financial Advisor certainly are not Parties. They were not born as a Customer or a Financial Advisor. They were Persons first and that is the nature of them as entities. Only later did they enter into relationships that provided them with those roles.

GRAHAM: Sure, but yours is a model of the real world as you see it. Mine is a model of a specific problem domain. My customer didn't know Charlie or Sally when they were born. And frankly doesn't care a jot about their role as human beings.

Specification and programming costs of generalisation

During specification, a generic class may cost more, by adding complexity to an entity model, than it saves by removing replicated class features. Not always, but often enough to be a concern.

During programming, a generic class usually makes it harder for designers to write programs.

Generic classes are often promoted as a means to save effort at amendment time. Even here the picture is far from clear cut. Let us see by way of example.

Adding a new attribute

Suppose users want to record post codes separately from addresses. The Party superclass provides the Single Point of Specification for this amendment, but this only works if the enhancement applies to all subclasses of Party.

Moreover, inserting a post code attribute into three classes is not a big deal; changes to relationships are more of a problem.

Adding a new class and relationships

Suppose the term 'Independent' means only that Financial Advisors are free to take commission from several Financial Institutions. Users now want us to record the contractual agreements (Commission Contracts) formed between Financial Institutions and Financial Advisors.

A Commission Contract is a cross-reference between a Financial Institution and an Independent Financial Advisor. A Commission Contract may be regarded as a third subclass of Inter-Party Contract.

You can easily extend the model with the new class, and draw relationships to it from the existing classes.

>> diagram <<

Does the presence or absence of the abstract superclass (Inter Party Contract) make much difference to the amendment effort? The generic class saves us specifying two attributes (Start and End Dates) for a Commission Contract and makes us specify an extra generalisation relationship. In practice, these savings and costs are too small to be of much concern.

The critical part of the amendment is the specification of new relationships from Institution and Advisor to Commission Contract, which must done whether the superclass exists or not. The amendment might mean redrawing a diagram or updating text documentation behind the scenes. It might mean altering a specification of data items or processes, classes or operations. Whatever you do, it is likely to take about the same amount of effort.

Database costs

Amendment of a specification, or a program, is one thing. Amendment of a live database structure is another. Amendments that force a database schema change normally cost more than those that involve only rewriting and recompiling programs.

The business entity model is a specification; it may become the structure to which business services are coded, but neither of these means that it must become the physical database structure. Of course the database must be designed to contain the business classes and relationships, but it may do using a different structure.

To reduce the costs of restructuring the database, the database designer may roll subclasses into generic database tables. Then, given the kind of software architecture described in the group papers on <Architecture definition>:

·       the user Interface layer presents specific subclasses, and

·       the business services layer programs deal with specific subclasses, but

·       the data services layer stores generic database tables.

The database designer must weigh the undeniable advantage of reducing database schema evolution costs, against difficulty of maintaining more complex data abstraction processes, and the performance costs below.

Performance costs

Implementing a generic class as a database table may damage the overall performance of a system in several ways. It can both increase database storage requirements and slow down transaction processing.

·       Implementing a generic superclass as a database table is likely to mean the system must generate and store an additional artificial key. This is not a trivial matter in large systems.

·       System-generated keys slow down distributed datan entry tasks, because every time an object is added to an abstract superclass, the distributed process must call one central server to activate the component that generates an artificial key for the new object. This component becomes a bottleneck.

·       Transactions will be slower if they have to instantiate an object by joining super and subclass tables, since this means accessing two tables instead of one table.

·       Transactions can be slower if they retrieve objects of a specific subclass via a generic relationship, because they must access many irrelevant subclass objects to find the ones they want.

·       Old-fashioned database technologies can slow multi-user access even further by locking the whole of a superclass database table when one row is accessed.

If several distributed locations are creating objects of a single class, then how to guarantee they do not create duplicate primary keys?

·       Either they share one key value generator - they call into a central server when they need to create an object.

·       Or each location has its own primary key range and its own key value generator. This means objects are location-dependent, in effect, they are members of distinct classes.

Conclusions

The places for generic classes

Generic classes have various roles. They can be used by analysts to prompt business people to think about their business. They are helpful in enterprise-level models, where they are needed to suppress detail, and prevent the enterprise model from growing impracticably large.

Guideline: where generic and specific classes both appear in an enterprise model, then it is advisable to show the generic relationships and suppress the specific relationships.

Database designers often roll subclasses into a generic class (or rather abstract table), to reduce the number of 'joins' and reduce the cost of database restructuring on amendment, even though highly generic tables do tend to make database programming harder.

Generic classes tend not to be so helpful in system specification-level models where the business rules must be made explicit rather than hidden from view. Given the goals are to specify the business rules at a level of generalisation users can understand, and to specify the processing constraints that designers must apply to datan entry, then it is advisable to show the specific classes.

Guideline: where both generic and specific classes both appear in a business rules model, then it is advisable to show the specific relationships, and perhaps to suppress the generic relationships.

Note that this discussion of generalisation has said nothing about granularity, that is the size of components, classes and operations. Granularity is discussed in the Chapters <Aggregate entities> and <The ubiquitous business object>.

Rules of thumb

Finally,  later chapters identify some relatively mechanical principles that help to restrict the pointless proliferation of class hierarchies in business rules models. A superclass must:

·       be non-trivial, more than a common attribute or two

·       have generic behavioral features (operations) not just structural features (attributes and relationships).

Subclasses must:

·       inherit ALL their superclass's features

·       extend their superclass with extra features

·       be mutually exclusive, not additive.

These points are explored in more detail in later chapters.

It is easy to define very high-level abstractions, and tempting to define deep class hierarchies. I believe it is almost always a mistake.

If you abstract a generic class that business people do not naturally talk about (e.g. Party), then this should not be regarded as a business class. It is more accurately regarded as a minor optimisation designed by us to avoid a minor redundancy in the specification (e.g. replication of name and address attributes in Customer and Employee classes).

Yes, a generic class can give greater adaptability to the resulting system, if carried through to design, but at some cost. If you cannot confidently predict the benefit will be realised, then don’t pay the cost!

Guideline: Very few abstract superclasses deserve a prominent place in a business rules model we use to code business services.

David Hay tells me his models always feature entity types that are general enough to apply to all businesses, but are also recognizable to all (see the earlier chapter).

A conversation with David Hay

GRAHAM: I notice in your book that Party is subtyped into Person and Organization. May I point out that until the UK tax laws changed, one-man companies were common?

DAVID: This is an interesting point. US tax law sees three kinds of businesses: corporations, partnerships, and sole proprietorships. A sole proprietorship is a company that happens to consist of only one person. The proprietorship is an organization and the owner is a person, with the two related to each other.

The model could be drawn as below, where ORGANIZATION = corporation, sole proprietorships, etc. and PARTY RELATIONSHIP TYPE = employee, owner, spouse, member etc.

PARTY <-- PERSON
PARTY <-- ORGANIZATION
PARTY (1) --< PARTY RELATIONSHIP >-- PARTY (2)
PARTY RELATIONSHIP TYPE --< PARTY RELATIONSHIP

Notations used here

TYPE <-- SUBTYPE

ONE --< MANY >-- ONE


GRAHAM: One tangible real-world person can appear in your model as two objects, one of class PERSON and one of class ORGANISATION. I wonder, would object-oriented designers complain about this? And it does suggest to me you are modeling roles rather than entities.

DAVID: In a sense you are right, but it's a different kind of role. I don't have a problem with a person appearing twice because I believe that the person that has attributes "birthdate", "social security number", etc. is a different thing than the sole proprietorship that has attributes "establishment date", "annual revenues", etc. In this case, by looking at the thing from a different perspective, you are looking at different things.

The same might be said for "customer" and "vendor", but in this case, the role is defined not by a different view of the thing itself but rather by the relationships the thing participates in. I can define a "customer" as a Party that is in a "buyer of" relationship in an order. You can't define a sole proprietorship as a person with a relationship.

OK, maybe you can, but I am going to pretend that you can't. So there! <g>

MICHAEL: I have had a lot of experience with generalisation over the last decade, starting before I became familiar with David Hay's approach. My most recent thinking has been that you should start with a literal business perspective, and then explain the benefits of the more generic perspective. You make the generic perspective the new business perspective.

GRAHAM: That implies a long-term commitment to managing users and models, a commitment that is beyond most of the environments I have worked in. My rule is - if you cannot confidently predict the benefit will be realised, don’t pay the cost.

How much generalisation is enough? Required? Desirable? Generalisation is easily overdone. E.g. it takes me only a few moments to construct the hierarchy below.

Level 1

Level 2

Level 3

Level 4

Partnership

 

 

 

 

N-way Partnership

 

 

 

3-way Partnership

 

 

 

2-way Partnership

 

 

 

 

Friendship

 

 

 

Contract

 

 

 

 

Account

 

 

 

Advice Contract

 

 

 

Commission Contract

This deep class hierarchy looks OK, and that it what is worrying, because it is not OK. Creating a class hierarchy to describe a world I imagine is not a profitable exercise.

The most useful classes and superclasses come not out of a software model builder's mind, but out of business people's understanding of their business. The best superclasses and subclasses have distinct features of interest to people running the business.

DAVID: There is a lot of bad hierarchy specification that I agree should be dispensed with. But I have ample examples of fairly deep structures that were both meaningful and well understood by my clients. For an oil company, I had to divide Real Spatial Element as shown below.

Level 1

Level 2

Level 3

Level 4

Notes

Real Spatial Element

 

 

 

 

 

Earth Volume

 

 

 

 

Linear Object

 

 

 

 

Location Point

 

 

 

 

 

Control Point

 

very important to an oil company (and anyone doing geographic enterprise applications), because the boundary of any piece of land is defined as a set of points. They want to be able to define that space.

 

 

Road Landmark

 

 

 

Geographic Area

 

 

 

 

 

Geopolitical Area

 

 

 

 

 

Country

 

 

 

 

State

 

 

 

 

County

 

 

 

 

City

 

 

 

 

Postal Area

 

 

 

Management Area

 

defined by organization, say marketing region.

 

 

Surveyed Area

 

as in the U.S. surveying system.

 

 

 

Township

 

 

 

 

Section

 

 

 

 

Range

 

 

 

Natural Area

 

 

 

 

 

Lake Boundary

 

 

 

 

Habitat

 

 

 

 

(Oil) Reservoir

 

 

 

 

Projection

 

GRAHAM: I understand you can build such a hierarchy. But why? And how stable is it? Challenges to mutual exclusivity might include:

·       lake = habitat? (house boats, houses on stilts)

·       city = county? (happens in the UK)

·       postal area = city?

·       management area = county?

DAVID: Why do it? Because it accurately describes the world. Should the structure be implemented with just a few tables? Probably. But in the analysis model it was extremely helpful to be able to explore categories and sub-categories.

GRAHAM: I am sure it describes the geographical features of the problem domains your customers have been interested in so far - at least at a general level.

DAVID: And these are fundamental classifications.

GRAHAM: I guess you mean fundamental in the sense that you don't have to change the top-levels of the class hierarchy much when you move from problem domain to problem domain. But there will be considerable redundancy in the classification for some customers. Few care about CONTROL POINTS and ROAD LANDMARKS.

DAVID: It isn't that there is redundancy. It is true that some of the elements of the model are not of interest to some clients. That's ok. They don't get included in that client's model.

GRAHAM: If nothing stops you extending your class hierarchies for new customers, then I suspect your hypothesis about the classification's validity cannot be disproved, and is therefore not a scientific claim.

DAVID: To the contrary. My models are a set of facts that are specifically constructed so that they can be wrong. That is the advantage of my method of labeling relationships. It is a fact that a geographic area is not the same thing as a point or an earth volume. They sound simplistic, but they are an important basis for what is built on this model.

GRAHAM: Again, one real-world lake can appear in your model as two objects, one of class LAKE and one of class HABITAT.

DAVID: The definition of a habitat is different from the definition of a lake boundary. REAL SPATIAL RELATIONSHIP (of a REAL SPATIAL RELATIONSHIP TYPE "habitat location") links them together.

It is a fact that a city is a kind of geopolitical area and not a management area. Denver County is one political entity, while the City of Denver is another. The fact that they are conguent is an additional fact.

One of the amusing things about this model is that people are inclined to "hard code" the relationships between cities and states, states and countries, etc. In the United States, you can assert specifically that each city must be in one state (Kansas City, Kansas is a different city from Kansas City, Missouri), and that each county must be in one state. But people who haven't travelled much want to assert that each city must be in one county. They don't know that New York City consists of five counties and Atlanta has something like three. So, the workhorse is REAL SPATIAL ELEMENT RELATIONSHIP.

While the boundaries of a GEOPOLITICAL AREA may change, of course, the fact that one constitutes an occurrence of an entity type that is classified as I describe does not. A city will always be a geopolitical area, not a management area or a natural area.

GRAHAM: Why not? The management team headed by the mayor who won the last city election will surely define the city's geopolitical area as a management area!

DAVID: The definition of "city" is that it is a bounded area whose boundaries are defined by law, and which has the characteristics of a municipality. It is governed by an organization for that purpose. (The government is a different thing, by the way--an ORGANIZATION.) If Lever Brothers decides that New York City is a "marketing area", the boundaries of that area would be the same as the boundaries of the City of New York. But it is a different thing. If the City is determined to be a habitat for a rare breed of pigeon, then the boundaries of the habitat may also be the same, but a habitat is also a different thing.

Again, the question is: what is the thing (object?) you are talking about. You allude to the right problem. I suspect that object-oriented people are too focused on tangible things. Too much Aristotle. Not enough Plato.

To see two different areas occupying the same space is not to describe the space redundantly. It is to say that two different areas are intimately related to each other.

GRAHAM: Hmm... You are interested in drawing entity models to understand and explain what a business is about. I am interested drawing entity models that work well as the database structures of enterprise applications to support known data processing requirements.

Conclusions

I have challenged David Hay about the extent to which class hierarchies dominate his entity models. David’s class hierarchies are surely very useful as analysis tool, but won't somebody have to transform them into my kind of entity model before they build an enterprise application?

This modeling practice took hold of some entity modelers when object-oriented design first became the vogue. But often, their class hierarchies were/are ill advised. And often, the hand over to design is problematic. Some analysts never realise the entity models they draw for discussion with users have to be rebuilt by the database designers. Anything that results in a needless structure clash between the entity model and the database, or indeed between the object-oriented code structure and the database seems bad news to me.

This brings me to a surprising conclusion, surprising because it is contrary to practices that are widely employed by enterprise data architects in large IT enterprises.

Guideline: Very few deep class hierarchies deserve a prominent place in a business rules model we use to code business services.

Abstraction by generalisation is a tool to be used with caution.

The same high-level abstractions or generic classes that are so useful to the enterprise entity modeler and analyst, often turn out, when it comes to the detailed design of a specific system, to be only minor optimizations of a very small part of the application to be built.

Philosophical postscript: do we model entities or roles?

DAVID: In your criteria for a good class hierarchy you should add: Both the super-class and the sub-class must be true entity types, without roles included in their definitions.

GRAHAM: Your examples tend to persuade me that you model roles all the time (how things appear to an observer), rather than real-world things, or “true entity types” (how things are).

DAVID: In a sense you are right, but it's a different kind of role. This is a philosophical question, of course. The entities I have in my book seem pretty solid to me.

GRAHAM: I am thinking a little like a philosopher; I find phenomenology more relevant that ontology. But I thinking also of the sciences. In a psychology classroom, we learn the world is less solid than our egos let us believe. (Vanity, vanity, all is vanity.) In physics, cosmology and biology classrooms we learn that things evolve over time. Where is my grandfather's axe after my father replaced the handle and I replaced the blade?

DAVID: We are dealing with definitions of terms here. Of course these change over long time, but in my experience, the classifications I use are pretty solid. The issue of what constitutes an occurrence is a different one. Although I am not sure that as modelers we can deal with that one. An occurrence is whatever the people putting the data in say it is.

GRAHAM: Interesting class v instance thing here. You talk of the classifications being solid, but the occurrences being a different issue. Many think the reverse: real-world objects (instances) are tangible, but our classifications of them are fragile.

DAVID: So, you are Aristotelian and I am Platonic. To you physical things are the most real. To me the "idea of the thing" is the most real.

GRAHAM: Well, I do believe the real world exists. But I believe there are many equally valid views of the things in it, so there is no one “true” classification of those things. I haven't yet managed to reconcile quantum physics with cosmology and e = mc2.

Pointers to where class hierarchies are and are not helpful.

There is a wide range of software design in which programmers find the inheritance mechanism provided by an object-oriented programming language helps them to economise, by reusing code.

This has encouraged software system designers to specify class hierarchies in the system specification, looking to maximise the use of inheritance. Some system analysts have now come to believe that they should be specifying class hierarchies wherever they can.

This chapter discusses some reasons why it is a mistake to try to impose class hierarchies on the persistent entities in the business services layer of an enterprise application.

Modelling business-world entities is different from modelling computer-world entities. It is easy to specify useful class hierarchies where the objects are records, transactions, windows, menus and command buttons. It is harder to specify strict class hierarchies where the entities represent persistent ‘real-world’ entities.

Fig. a shows our notations.

 

Fig. a

I will show why what appears to be a class hierarchy of persistent entities may better specified in a different way, often as an aggregate.

I will look in turn at: crude top-down classification, disputable divisions between subclasses, parallel rather than mutually exclusive classes, designer generalisation, and instances that change subclass.

Two themes run through all the sections below. First, the boundary between the classes of real-world objects or business entities becomes fuzzier the longer you look at them. Second, longevity turns types into states.

Crude top-down classification

Some people have proposed using stepwise refinement to develop not program structures (as Dijkstra proposed in 1972) but data structures. Their idea is to start with a few generic classes like Location, Person and Resource and then develop elaborate class hierarchies beneath them. Fig. b shows the start of this process.

Fig. b

There are many difficulties with this top-down classification approach. One is to do with how fragile the notion of mutual exclusion is. What if a Person is both a Customer and an Employee? I have already considered this in chapter 4 and we’ll come back to it later. I want here to challenge the broad ‘top-down’ approach rather than discuss semantic details.

The Chain of Christmas Trees pattern

The chief database designer a project in the US, after a brief introduction to object-orientation, led the analysis of the system by building four or five class hierarchies and connecting the top-level super-type objects by many-to-many relationships.

Fig. c gives an idea of what was done, except that there were about a hundred sub-types in each class hierarchy. I call this pattern the ‘Chain of Christmas Trees’. It doesn’t get you very far in software specification. This kind of model is so general that it has very little information in it.

Fig. c

Where are the facts and constraints?

The specification above does not say how or why an Employee should be connected to an Office. What if there are really two meaningful relationships: ‘works at’ and ‘works for’? Fig. d shows you can do better by adding specifically named link entities between the specifically named subtype objects.

Fig. d

The two cross-references in the diagram above reveal two entirely different sets of Locations for a Person, and Persons for a Location. Once you realise this kind of analysis is necessary, the top-down development falls apart. You have to start again with a more traditional entity relationship modelling approach.

Disputable divisions between subclasses

In the world of mathematics, a circle and a square are different by virtue of their definition by mathematicians. In the world of computers, a file and a record are different by virtue of their definition by software designers. But in the natural world, things are not so sharply defined.

At least one author has claimed that class hierarchies in software design reflect the nature of the real world. Fig. e shows a favourite example, the biologists’ hierarchical classification of species.

Fig. e

In fact, there is probably no such thing as a hierarchy in nature. The eminent biologist Richard Dawkins has pointed out that the higher levels of this classification are artificial, a man-made construction imposed on the fuzziness of nature.

You might hope there is certainty at the bottom level. Yet one species gradually (by tiny incremental changes) divides into two or more species, or evolves to become another. Obviously, the classification is transient over time. Less obviously, it follows that the edges of the classes are uncertain at any moment in time.

Even Darwin regarded the term ‘species’ as a ‘mere useless abstraction’ and ‘arbitrarily given, for the sake of convenience’.

A class hierarchy with uncertain boundaries between subclasses is a difficult thing to manage. People can find it hard to assign objects to a class. People will want to revise the class hierarchy in the light of new thinking and the discovery of new objects. Revising a class hierarchy is made more difficult when objects must persist from one version of the class hierarchy to the next.

Parallel rather than mutually exclusive classes

It is tempting to specify a class hierarchy where all but a very few objects are either one subclass or another. However, if the mutual exclusion is not an absolute rule, merely a tendency, then you will prevent the system from working properly.

Fig. f shows an example I came across in real system design. The Security class was specified as either Stock (earning interest) or Share (attracting a dividend).

Fig. f

I soon discovered a few Securities that are both Stocks and Shares. (I might have guessed this sooner, on finding that the business gives every Security a unique number drawn from a single range, rather than drawing on separate ranges of numbers for Stocks and Shares.)

Users would be irritated by a system that made them record a Security as either Stock or Share. Some would have to enter the exceptional Securities twice. Some would fail to find all the information they want about such a Security because it has been split into two. Some would find that statistical reports are inaccurate.

Mutual exclusion rules that don’t last

It is tempting to specify a class hierarchy if the mutual exclusion rule holds at specification time. However, the rule may break down; what seem to be disjoint subclasses when first analysed become parallel aspects soon after implementation.

A business might know that all its Vehicles are either Cars or Trucks, and all its People are either Man or Woman. But a thoughtful analyst should predict that one day the system will have to cope with a Vehicle that has the properties of both Car and Truck, or a Person who exhibits the properties of both Man and Woman.

Again, if you don’t drop the mutual exclusion rule, the exceptional cases will have to be recorded twice in the system, or else processed outside the system. Fig. g shows the two data structures redrawn as aggregates to accommodate exceptions.

Fig. g

Designer generalisation

A real-world entity can play many roles at once. A business contact can be a customer and a supplier. A person can be a doctor and a patient.

The fact that these ‘class hierarchies’ are non-disjoint because some organisations are both customers and suppliers, and some people are doctors and patients is not the issue here.

The point here is that many businesses, much of the time, are perfectly content to record the distinct roles as distinct entities. The real-world entity is simply not important or valuable as a business entity.

 

Generalisation to track the real-world entities behind their roles

Designers sometimes create a class hierarchy in an attempt to keep track of the real-world entities that lie behind several business entities.

Suppose Healthcare administrators want their system to avoid printing out two Christmas cards to the same Person in the role of Patient and Doctor. The designer might specify Patient and Doctor as subclasses of Person, thinking this will help.

In general, such requirements are difficult to meet unless the users have a business identifier by which they can recognise instances of the real-world entity. The difficulty is that the business deals with the roles played by real world entities, not the entities themselves.

If the users have no way of identifying the real-world entity that lies behind several roles in the system, defining a class hierarchy is not going to be helpful. Fig. h shows you could instead specify the real-world entity as a kind of link class, related to its roles as business entities by associative relationships.

Fig. h

If the objects of the link class are visible to users, distinct from their roles, the link class should be given a meaningful key, even if this is a compound of all its attributes.

Generalisation to optimise design

Designers sometimes specify a class hierarchy to avoid duplication of specification and code. Fig. i shows an example where designers specified a class called Organisation to hold address details and a telephone number.

Fig. i

But the business deals with Customers and Suppliers, not Organisations. Users are quite happy to record details twice for the small number of Organisations that are both Customer and Supplier.

Indeed, users may want to record different addresses and telephone numbers for each role. They may even want to record several addresses for a Customer or a Supplier.

Where design optimisation is the motivation, you might respecify the superclass as a link class, related to the business entities by associative relationships.

Instances that change subclass

Persistent objects can hang around for a long time. In many of the class hierarchies people try to impose on the real-world, an object instance may change from one subclass to another. Fig. j shows two mistakes I have come across in real system design.

Fig. j

You might put the model right by specifying the superclass as a master, related to ‘period of time’ as a detail, and then subdivide ‘period of time’ into the different classes.

However, in practice, this is one of many cases where the subtyping is better specified as a cycle of states within a class than in the entity model. State machines are a better specification vehicle than entity models for capturing business rules of this kind. See chapter 8 for further discussion.

Where class hierarchies are useful

Class hierarchy are most useful where the hierarchy:

• is firm not fuzzy, has strictly disjoint subclasses

• persists as a definition as long as the lives of objects it defines.

Where in enterprise applications are these true? Fig. k is a picture that divides an enterprise application into three layers and helps us to show where class hierarchies are most useful.

Fig. k

Generally speaking, it is easier and safer to define class hierarchies of computer-world objects than of business entities. So class hierarchies are more useful in the technological layers of design than in the business services layer of the three-tier architecture.

This is not a popular view with those object-oriented designers who want to prove the value of inheritance in the business services layer. But it is a fact of life.

Computer-world objects in the UI and data services layers

When you are constructing UI layer components, you have the power to order and classify them as you like.

Fig. l

To a lesser extent the same is true of the data services layer. It might make sense to look for inheritance trees in the data services layer or the Data abstraction layer. (You can restructure the classes in the data services layer whenever you like, if you can bear the costs of data migration.)

Transient event classes in the business services layer

Since class hierarchies are much more common where the objects are short-lived, they are more likely to be found among transient event classes than persistent object classes.

Robinson and Berrisford (1994) suggested that in enterprise application development you can normally make greater reuse of superevents (transient business objects) than superentities (persistent business objects).

Model transformation for schema integration

Schema integration is a one-off exercise. If you plan to merge two schemas, you do need to make the current rules (terms, facts and constraints) fully explicit. You can be confident that the range of a type, the instances of a class, the rules of the business, will not change while you are working. You might well convert any range of subtypes into a class hierarchy, just for the purpose of comparing schemas, as discussed in the volume ‘Introduction to rules and patterns’.

Data item definition

A class hierarchy may be used as a device for structuring the contents of a data dictionary, where all that is required is to allocate the various data items as attributes of entities. Fig. m arranges the data items involved in a hospital system using the notion of a class hierarchy.

Fig.6m

This model is not good enough for building an enterprise application, since the relationships between entities are specified incorrectly. For system design, the model must be redrawn after considering:

• a person may be both a doctor and a patient

• a doctor may swap between working in a hospital and family practice

• a person can be a patient on many occasions.

Specifying the classes in the form of a class hierarchy fails to capture any of these rules.

It is much better to specify the ‘subclasses’ as ‘parallel aspects’ connected to a ‘basic aspect’ by associative relationships. Not only will this reduce schema evolution problems if the system ever has to record the history of a person over time, it is a more accurate specification of the real world even if this day never arrives.

Generic domain classes

As soon as a type of information is identified as generally reusable in lots of businesses, technology vendors can make a profit by selling it to business system developers.

There is some scope for defining class hierarchies among universal domain classes such as text, number and date.

It is hard to imagine how a business can gain a competitive advantage by focusing its efforts on specifying hierarchies of classes that it shares with other businesses. Technology vendors can supply generic class definitions. Packages should take over here. System developers will always be left to define the business-specific classes, the ones that cannot be reused so widely.

Generic business entities?

While class hierarchies are useful specifying transient objects in the more technology-bound layers of the three-tier software architecture, it is hard to find useful class hierarchies where the objects represent persistent external real-world entities,

If you have a large business information database that has more than a small class hierarchy in a corner of it - please show it to us.

The only convincing examples I have seen to date are drawn from financial systems. The superclasses are things like Account, Transaction and Deal. These are frequently presented in books and talks.

These are highly generic entities - not specific to one business. They are also highly abstract - not as nearly as concrete as classes like Employee, Building, Ship and Engine.

Summary

Modelling real-world objects is different from modelling computer-world objects. It is easy to specify useful class hierarchies where the objects are computer-world entities such as records, transactions, windows and command buttons. But I am convinced that class hierarchies are of very limited value in the business services layer of database transaction processing systems.

Grady Booch has said to us ‘It is a mistake to search too aggressively for class hierarchies and inheritance in the entity model of a business. Sometimes its there and it pops out at you. More often there aren’t many class hierarchies to be found. Many things that look like mutually-exclusive subclasses turn out to be what you call parallel aspects. Where you do define a superclass, it must have common behaviour as well as common data attributes.’

It turns out that in specifying the essential processing requirements and constraints in the business services layer of an enterprise application, more reuse can be achieved in other ways

There is more reuse to be found by associative relationships than by inheritance relationships. To discover and specify this kind of reuse, object-oriented analysts need to think in terms of parallel aspects rather than class hierarchies of mutually exclusive subclasses.

There is also more reuse to be found between transient events than between persistent objects. Transient event class hierarchies are more useful than persistent object class hierarchies. To discover and specify event class hierarchies, object-oriented analysts need to add an event-oriented perspective to their existing object-oriented perspective.

In short

object-oriented technologies that support inheritance do help designers working in the technology-bound UI and data services layers, but the same principles do not provide analysts with the ‘leap forward’ that people have hoped for in the business services layer.

The persistence of objects and the fuzziness of the real-world makes it harder to specify strict class hierarchies where the objects are models of external real-world entities. More reuse can be achieved in other ways. object-oriented theorists need to take these other ways on board. A complete object-oriented systems development approach must help us to recognise and model the events as well as the objects.

Polymorphism is a powerful programming tool; but it is easy to cheat, to create an abstract class where there isn’t a natural class hierarchy.

 ‘uncontrolled polymorphism would be incompatible with the concept of type’. Meyer (1988) says

It is possible that some object-oriented designers apply the ideas of type and polymorphism outside of the proper context, because they lack the concept of an event. We need a design method and a language for talking about events as well as objects. A helpful definition is:

An event is a minimum unit of consistent change to a system, a set of effects on one or more objects which must succeed or fail as a whole.

Given this definition, an event may affect several different objects of different classes. It is a good idea to record all these effects in an event model (object interaction diagram or use case) for the event.

When type should be state

The State design pattern might be used to make the effects of one event look like a polymorphic method. Say the Death event has different effects on a Person depending on whether the Person is employed or not. The designer might describe these as polymorphic methods depending on the type of an object. To the analyst they are optional effects of an event, depending on the state of the object.

When type should be role

Say the Death event has different effects on the Person (dead) and the Employer. Again, these effects are not best described as polymorphic methods depending on the type of an object. They are effects of an event on objects of one class playing concurrent roles with the respect to the event, depending on the identity of the object.

When abstract class should be event manager

An event model implies a weak kind of polymorphism. You can (we may say should) name all the methods in an event model after the event that fires them. The methods share the same name, but not the same effect, and not necessarily the same interface. In some cases you may be able to define a common interface for all methods fired by an event, perhaps:

            input: event name and parameters

            output: OK or Error code.

You might then create an abstract class for the event type, which is a supertype of all the classes that appear in the event model. Designers do indeed define such event classes in processing transient objects in the UI layer. But processing objects in the business services and database layers is different.

Classes of transient events appear in enterprise applications in the guise of ‘event managers’. An event manager is a transient process that handles one event instance. It is a convenient home for things like transaction management and error handling. However, analysts do not normally include event managers in the entity model that specifies the data structure of persistent objects.

We need a richer theory, one that accommodates type and state, objects and events, inheritance and polymorphism one the one hand, event managers and multiple event effects on the other.

 

 

 

References

Ref. 1:   “Software is not Hardware” in the Library at http://avancier.co.uk

 

Footnote 1: Creative Commons Attribution-No Derivative Works Licence 2.0

Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.co.uk” before the start and include this footnote at the end.

No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it. For more information about the licence, see  http://creativecommons.org