Principles for defining a controlled vocabulary (or reference model)

This page is published under the terms of the licence summarized in the footnote.

This paper one of several papers on description and reality, on things, thoughts and words.

It sets out principles for the vocabulary used in a reference model for a domain of knowledge (like enterprise and solution architecture).

It also draws attention to how misunderstandings inevitably arise from natural ambiguities.

Removing all ambiguities from a text might be possible in theory, but is extremely difficult in practice, and doing it can make the text unreadable.

Background

A question arose as to how best to minimise ambiguity in the vocabulary we use for architect examination purposes.

Known sources of ambiguity to be considered included synonyms and homonyms, also terms that can refer to either the type of a thing, or an instance of a thing, or both.

I set out to write these Vocabulary Definition Principles – 2,000 words on basic principles for vocabulary definition.

Research into these principles spread into mathematics, metaphysics and linguistics.

It turned out that one issue was the difference between a collection of things (or set) and a kind of thing (a type).

It proved necessary to answer questions addressed in A Type Theory for EA – resulting in several thousands words on type-instance relations and related ambiguities.

In doing that, I found my understanding of set theory to be wrong, so had to explore A Set Theory for EA

This lead to a several more thousands words on how, in our domain, we focus on dynamic sets/types, rather than the static sets/classes of classic system theory.

Curiously, the last, the most abstract technical view, is the one from which a few general principles of EA emerge.

Acknowledgements

Thank you to those who have reviewed papers, especially a professor of logic and my brother, who both brought a mathematician’s perspective.

Contents

On building a controlled vocabulary. 2

Three warrants for choosing terms. 2

Avoiding synonyms and homonyms. 2

Avoiding tautological definitions. 2

Separating definitive and illustrative sentences. 3

Ensuring internal consistency. 3

Intensional definition by genus and difference. 3

Countering abstraction by example. 4

On generalisation and composition hierarchies. 4

On type and instance ambiguity. 4

Conclusions and remarks. 5

On building a controlled vocabulary

Statements made in natural language are often sloppy and ambiguous.

The more the reader knows about what the writer means, the less of a barrier that is to understanding.

In some situations, ambiguity is less tolerable: e.g. examinations in enterprise and solution architecture.

How to organize the knowledge in a syllabus to ensure examination setters and sitters are clear?

· In a natural language vocabulary, there is no restriction on the vocabulary.

· In a controlled vocabulary, predefined words and phrases are used to tag units of information so they may be retrieved by a search.

· In a taxonomy, concepts are related by “is a kind of” or subtype relations.

· In a true ontology, concepts are defined by and related to other concepts using a variety of relation types.

A vocabulary creator ought to understand a little about taxonomies and ontologies.

For an introduction to relevant ontological concepts, read the related papers.

The reference model adopts some ontological conventions, without being a fully formal ontology.

Three warrants for choosing terms

A vocabulary creator must choose terms carefully, compromising between the principles of:

· user warrant (what users are likely to use),

· literary warrant (what the literature says), and

· structural warrant (what helps to make the structure and content of the vocabulary clear and consistent).

Sometimes, the need for consistency means avoiding terms that users consider natural.

Avoiding synonyms and homonyms

In defining a vocabulary, it helps to avoid:

· synonyms (several words for one concept) and

· homonyms (several concepts for one word).

Again, this can mean avoiding terms that users consider natural.

Avoiding tautological definitions

In defining the meaning of a term, the general rule is to define one word using different words.

· “A currency unit is a unit in a system of money” is tautological.

· Better to say “A currency unit is a countable item in a system of money.”

However, you may reasonably re-introduce the defined word in a later sentence, especially an illustrative one, as below.

Separating definitive and illustrative sentences

Normally, the first sentence is the most definitive. Later sentences may be illustrative.

· “A currency unit is a countable item in a system of money, used in exchange rates.” implies the second phrase is definitive.

· “A currency unit is a countable item in a system of money. Currency units are used in exchange rates.” implies the second sentence is illustrative.

Ensuring internal consistency

A convention is to use defining words that are in turn defined elsewhere in the same dictionary

· A nation state is a state and a nation.

· A state is a political and geopolitical entity.

· A nation is a cultural and/or ethnic entity.

· An entity is a type with an identifier, or an instance with an identity.

Notice the ambiguity, the two optional definitions, in the last. Types and instances are discussed later.

Intensional definition by genus and difference

Real life dictionaries inevitably have circles of definitions, because they have to define all words of the language.

That does not prevent them from being useful in practice, since they are not intended for learning a first language from scratch.

However, in a reference model designed for teaching a subject, it is not good practice to say:

· A currency is a system of money.

· A system of money is a currency.

Since words define words, and defining words can be synonyms of words defined, it can be hard to avoid circular definitions and ambiguity.

How to avoid, or at least minimise, reciprocal definitions?

A useful approach is intensional definition by genus and difference.

E.g. To define an even number, first state a general type (integer) to which the concept belongs, then add specific properties (the double of another integer).”

· Yellow [a colour] between orange and green in the rainbow spectrum.

· Honolulu [an island] in the Pacific ocean

· A spade [a tool] used for digging.

The hope is that the reader already understands the more common, embracing or generic word.

If not, the brackets show the reader they can look it up.

Avoiding circular definitions implies imposing a hierarchy on the defining [general type] words.

This type hierarchy may be called a taxonomy.

Countering abstraction by example

Beware the abstraction involved in constructing a hierarchical taxonomy in which every word is defined by a "higher" word.

Eventually you reach a few highly abstract, words/concepts such as “thing”, “description”, and “idea”.

Abstracting a definition from physical to logical, or from specific to general, tends to remove detail and meaning.

It can lead to loss of intelligibility and practical usefulness.

Sometimes it helps to move to the more concrete, by giving an example.

On generalisation and composition hierarchies

The BCS reference model uses the term "aim hierarchy" in relation to goals, objectives and requirements.

It indicates that goal, objective and requirement are three subtypes of “aim” with common properties such as priority and deadline.

The definitions form a two-level type hierarchy.

It says for one system, there can be many instances of goals, objectives and requirements, which may be arranged in a hierarchical structure.

From the top-down, goals are decomposed into objectives. From the bottom-up, requirements for a given effort are traced to objectives.

This makes for a multi-level composition or delegation hierarchy of aim instances. (Cf. delegation in a “balanced score card” system).

The term "aim hierarchy" might be read as referring to the two-level class hierarchy.

But it was in fact written with reference to multi-level composition or delegation hierarchy of aim instances.

Both meanings are possible, and valid in their different ways.

The wordiness needed to avoid ambiguity is laborious for speakers and listeners; people just don’t do it, they rely on the context to make things clear.

So again, this paper is partly about being clear, and partly about the inevitability of such ambiguities.

On ambiguity between types and instances

This section is amplified at length in a related paper.

In any domain we study or describe, there are things (individuals, instances or objects) of interest.

A type is an ideal, an abstraction from one or more things we are interested in.

The type gives us least a partial idea, model or description of something we are interested in.

The distinction between types and instances is often unclear in natural language.

E.g. A gardener tells you he has been cataloguing the rose bushes in his garden.

Does he mean the rose bushes instances? Or rose bush types?

E.g. a sentence saying that events affect entities can be read to refer to either types or instances (of event and entity).

In this example, it probably doesn’t matter which meaning is taken. In other cases, it can matter.

What is a system requirement?

Is it a requirement type, a kind of need, a general concern, that is applicable to many systems, many transactions?

· Throughput

o Throughput definition: A concern about transaction volume, measured as volume over time.

o Throughput measure: Transactions per second.

· Response time

o Etc,

Or is it a requirement instance, a particular need, a specific interest, for one system?

· Requirement id: 9998

o Transaction: Price enquiry

o Throughput measure: 10,000 transactions per second.

· Requirement id: 9999

o Transaction: Price change

o Throughput measure: 1,000 transactions per second.

Notice the attribute type of the requirement type is instantiated with an attribute value in each requirement instance.

Beware that our interests span not only real and concrete things, but also imaginary and abstract things.

And looked at from different perspectives, a thing can be an instance or a type.

Conclusions and remarks

The goal of the reference model is to provide a controlled vocabulary for training and examinations.

The reference model ought to be unambiguous and internally consistent.

The current reference model is a very loosely controlled vocabulary, for several reasons, not least the risk that more pedantic wording might make it unreadable.

However, research suggests it possible, and would be a good idea, to recast all 400 reference model entries in this more taxonomical style.

· Spade [a tool] used for digging. Further explanation...

This format helps to reduce the possibility of tautological and circular definitions.

It can be adopted in the reference model in the style below.

Description: an abstraction that communicates features of the thing described.

Type [a description] that defines what instances share by way of a form, structure or properties. E.g. a “business application” has a user interface and access to stored data.

Property [a type] an attribute type such as version number.

Several further principles for a reference model should be clarified and enforced.

1. The RM entries do not say all that could be said, since they are intended to limit what examiners can set questions on and candidates have to learn. Trainers are free to explain any entry in depth if they wish to.

2. The RM lists concepts to be understood, not all terms that might be used.

3. The RM is not a general-purpose dictionary: it minimises synonyms (several words for one concept) and homonyms (several concepts for one word), since alternative terms and definitions undermine the goal of a controlled examination vocabulary.

4. RM terms are chosen by compromising the principles of user warrant (what users are likely to use), literary warrant (what the literature says), and structural warrant (what helps to make the structure and content of the vocabulary clear and consistent).

5. The RM does not invent terms other than is justified by the principle of structural warrant.

6. The RM focuses more on discrete elementary concepts rather than aggregates of them (since the contents of aggregates are more disputable).

7. The RM is not a standard; it describes architecture deliverables and techniques but does not mandate any of them or set any rules for what architects should do.

8. The RM reflects the scope of current architecture frameworks; it does not speculate about future trends.

9. The RM takes a best of breed approach (drawing, for example, from TOGAF, ArchiMate, ISO 42010, ITIL and PRINCE). There is minimal commentary on such sources – only a few remarks on where discrepancies may need to be recognised.

Footnote: Creative Commons Attribution-No Derivative Works Licence 2.0 16/12/2015 23:17

Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.co.uk” before the start and include this footnote at the end.

No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it.

For more information about the licence, see http://creativecommons.org