Generalisation: What’s wrong with inheritance?

Copyright Avancier Limited. All rights reserved.

 

The paper on abstraction included the table below. It shows four scales in which abstraction works from the bottom up. This paper is about only one of those scales, the one called generalisation.

Abstraction by

Omission

Composition

Generalisation

Idealisation

Abstract

Vacuous

Coarse-grained composite

Universal

Logical Concept

 

Sketchy

Mid-grained composite

Fairly generic

 

 

Elaborate

Fine-grained composite

Fairly specific

 

Concrete

Complete

Elementary part

Uniquely bespoke

Physical Material

 

Generalisation-specialisation (section repeated from paper on abstraction)

An instance of a specialisation is at the same time an instance of all generalisations above it. A specialisation contains its generalisation in some sense. So the full description of a specialisation must be longer than the description of its generalisation. The specialised concept inherits from or extends the generalised concept.

 

For example, the two subtypes below extend the definition of their super type:

  • Investments - earn income.
    • Bonds - (earn income and more particularly) bear interest.
    • Equities - (earn income and more particularly) bear dividends.

 

Similarly, the two subtypes below extend the definition of their super type:

  • Animilae - consume other organisms
    • Chordata - (consume other organisms and) have nerve chords.
    • Mollusca - (consume other organisms and) have thin shells.

 

A generalisation is often an abstract group of things, rather than a concrete composite. There may be instances of the generalisation in the real world; but you cannot address them as a group, they are not co-located, they have no manager, no owner, no physical shell around them.

 

Generalisation or specialisation can be done repeatedly to create a multi-level hierarchical structure. For example: Consider the Linnaean classification of living things.

  • The animal kingdom is composed of phyla, which are composed of classes, which are composed of orders, which are composed of families, which are composed of genera, which are composed of species.

 

This is an abstract composition hierarchy. A higher category in the classification is a composite of lower categories. But more importantly, every category is a generalisation; it defines those properties that are shared by all the specific types below it. Its main purpose is to generalise.

Taxonomic group

Particular general types

Kingdom

Animalia

Phylum

Chordata

Mollusca

Class

Mammalia

Gastropoda

Order

Primates

Pulmonata

Family

Hominidae

Arionidae

Genus

Homo

Species

Homo sapiens

Black slug

 

Generalization is fundamental to the highest level of enterprise architecture, where enterprise architects define:

  • General architecture principles, policies and standards.
  • Generic organisation structures, process structures and data structures.

 

Enterprise architects look to impose such generalisations on the different divisions of enterprise, for the sake of strategic goals to do with command, control and interoperability.

 

Generalization in enterprise architecture can be overdone. It can yield vacuous abstractions - structures that are highly reusable yet of little value in each place they are used. The more general the structure, the less helpful in designing a specific solution, the more work is left to the designers. Also, in IT architecture, generalisation is often the enemy of performance. So use it with caution.

 

The concept of a generic type is also fundamental at the lowest level of data processing system design, where software architects define:

  • In data definition: the data types of data items, where one data type may specialise another.
  • In object-oriented programming: the types or classes, where one type or class may specialise another.

 

Generalization in detailed software design can also be overdone. See below.

On sets and types

One might say.

  • A set is a composite: it is a group of things that share properties, if only a common manager or owner.
  • A type is a generalisation: it is a form or structure common to several things; a property or constraint shared by all instance of a given kind.

 

Why is differentiating composition and generalisation is so tricky?  Is it because the concepts of composite and generalisation relate to the confusingly overlapping concepts of set and type?  Wikipedia tells us:

 

“Set theory is the branch of mathematics that studies sets, which are collections of objects. Set theory, formalized using first-order logic, is the most common foundational system for mathematics. The language of set theory is used in the definitions of nearly all mathematical objects, such as functions. Elementary facts about sets and set membership can be introduced in primary school, along with Venn diagrams, to study collections of commonplace physical objects. Elementary operations such as set union and intersection can be studied in this context. More advanced concepts such as cardinality are a standard part of the undergraduate mathematics curriculum.”

 

“Type theory is any of several formal systems that can serve as alternatives to naive set theory. In programming language theory, type theory can refer to the design, analysis and study of type systems.… Alonzo Church, inventor of the lambda calculus, developed a higher-order logic commonly called Church's Theory of Types. Church's type theory is a … a typed lambda calculus. …In typed lambda calculi, types play a role similar to that of sets in set theory.”

 

Although Wikipedia tells us that “types play a role similar to that of sets in set theory, I gather that set theory and type theory are somehow in competition with each other.

On types in data processing

The concept of a type is fundamental to and ubiquitous in data processing system design.

 

Data type: A type that defines the properties shared by instances of a data item. (E.g. integer, floating-point number (decimal), and alphanumeric string.) A data type constrains the values of a data item. It also defines the processes that can be performed on a data item or larger data structure. Thus, a data type is an interface to a component, a kind of service contract.

 

“A type system divides values into sets called types — this is called a type assignment — and makes certain program behaviors illegal on the basis of the types that are thus assigned. For example, a type system may classify the value "hello" as a string and the value 5 as a number, and prohibit the programmer from adding "hello" to 5 based on that type assignment. In this type system, the program.” Wikipedia

 

However, some (Chris Date for one) argue that type theory applies only to programming language design, at the small scale of universal data types. And that for enterprise database applications, set theory provides a better basis for software design. This seems to be another example of how you have to change your view of things as you move up from the small to the large.

 

In the loosest sense of the term, types appear in many guises as:

·         a data type – defining the properties of a data item

·         a set of instances in database design.

·         a base class in a class hierarchy in object-oriented software design.

·         a subroutine or shared service of some kind.

·         a generic process structure - with generic steps like capture, validate, approve, execute, close.

·         a generic data structure - with generic entities like place, party, product, property and process.

On the evolution of types

Types in mathematics are fixed concepts. A study of physics and chemistry will yield some fixed types. And in biology, the hierarchical tree of evolution is a fixed structure, describable in a cladogram. Outside of these hard sciences, types are transient. Types in the natural and business worlds are not fixed. Biologists know the Linnaean hierarchical classification of species is only an imperfect or approximate description of the structure of the biosphere as it is today – and they still tinker with it. Business managers frequently change their mind about how to classify their products and their customers, not so much tinkering as rethinking.

So what’s wrong with inheritance? (for OO programmers only)

Inheritance is the mechanism in object-oriented programming by which one component type (or class) includes the properties of another. Inheritance is used in two different ways that correspond to two of our four kinds of abstraction.

 

Abstraction type

Base class

Inheritor Class

Inheritance is used to

Idealisation-realisation

Ideal

Real

Implement: provide concrete operations to implement an abstract interface

Generalisation- specialisation

Generic

Specific

Reuse: add further specific operations to the generic operations

 

Around 1990, use inheritance to increase reuse was all the rage. Around 1995, OO gurus were tending to the view that using inheritance to extend generic types was less important than using it to reify abstract interfaces. However, this paper is about generalisation rather than idealisation. Generalisation runs into some difficulties.

 

A reader writes:

 

Single inheritance among programming language classes offers a poor simulation of abstraction.

 

Part of the problem is that in real abstraction, you are just reducing the number of assertions you're making about things. If you always boil any type description down to a number of Boolean assertions, you can tell whether A is an abstraction of B by looking to see whether all the statements of B imply statements of A. In the simplest case, A's assertions are a subset of B's.

 

But in any practical programming language, there are assertions you could mistakenly make about a superclass that aren't true of its subclasses. For example if in Java, class Investment has a method GetReturn(), it would be a mistake to assert that all instances that are members of Investment have that particular GetReturn() method - there might be an override in a subclass. On the other hand, it would be correct (with Java) to say that all instances have some sort of GetReturn() method - you just can't say much about the results.

 

On the other hand, in a good discipline of testing, especially in an agile context where functionality is being monotonically added in successive sprints, it should normally be possible to add tests gradually without changing them. So the earlier deliverables conform to an abstraction of the spec conformed to by the later ones.

 

The author replies. Thanks for your observations on the risk or issue of defining a class hierarchy that isn't a proper generalisation-specialisation structure. My concerns about inheritance from a generalisation are more to do with time (evolution), size and the fuzziness of the real world.

 

  • Time changes generic types. Inheritance works well where a component type is inherently stable. So the operations of a component instance (object) are fixed for its lifetime. Inheritance works less well where classes represent entity types in nature or in business, since these continually evolve beyond the control of the programmer.

 

  • Size makes generic types less reusable. Inheritance works at the level of very small data types. Inheritance between components encapsulating large data structures is difficult to impossible. You simply don’t want another component to inherit all those operations.

 

  • The fuzziness of the real world undermines the validity of generalisations. Inheritance works well between computer-specific types (like GUI controls), where every instance must conform to its type. In the natural world, there are often exceptions to the rule. You might propose that nationality and sex are types of individual, then find people with dual nationalities and people with odd chromosome make ups.

 

I spent much of the 1990s telling my clients that if they want reuse – certainly in the domain where most enterprise and solution architects work - reuse by delegation of work to shared services is more useful than reuse by inheritance. I wrote several papers on this theme and co-authored book on it. Google will find them if you type in “Graham Berrisford”.

 

I suspect the set theory versus type theory debate underlies what surfaces in software engineering as the OO-Relational paradigm clash. And I suspect this last is rather too often a self-inflicted wound, caused by people over-engineering class hierarchies into their OO software design, where a more ‘relational’ or set-based structure would prove a more economic and flexible design.