Avancier Methods

Information/Data Architecture (with TOGAF & ArchiMate artefacts)

One of more than 200 papers at http://avancier.website. Copyright Graham Berrisford. Last updated 21/05/2017 22:44

 

Click here for illustrations of the diagrams mentioned below.

 

Contents

Mainstream EA.. 1

Data stores – data at rest 2

Data flows – data in motion. 5

Footnotes. 8

 

Mainstream EA

EA emerged in the 1980s to address the need for enterprise-wide analysis and re-architecting of data and processes.

Today, mainstream EA remains about business activities that involve the creation and use of information.

It is about the design and planning of changes to those business activities, and/or to the capture and provision of that information.

Moreover, enterprise architecture (rather than solution architecture) is about doing this at a strategic and cross-organisational level.

To digitise business operations that create and use information, architects must attend to modern standards and technologies.

 

What follows is the information/data architecture approach in Avancier Methods.

It maps information/data architecture to business architecture and applications architecture.

It groups TOGAF artefacts into views, and includes suggested mappings to ArchiMate.

Bear in mind that ArchiMate is not designed for professional business process modellers or data modellers.

 

Do "information" and "data” mean the same thing? If not, what is the distinction?

Various distinctions are explained and analysed in this paper “Data and Information”.

Maintaining any of the terminology distinctions (in discussing or writing) is very difficult.

This paper uses “data” as a catch all; you can replace some or all by “information” as you see fit.

 

Modern data architects speak of being concerned with:

·         Data at rest - data stores – their locations, contents and synchronisation

·         Data in motion - data flows – their sources, destinations and contents

·         Data qualities – data types, standards, confidentiality, integrity and availability.

Data stores – data at rest

Businesses have to store information for future use, in persistent data stores.

TOGAF uses the term data component rather than data store.

There are two interpretations of “data component”:

·         A passive data structure, contained in some kind of data store.

·         An active data server (an application component) that provides read/write access to a data structure.

Either way, the data component contains a data structure that can be described in terms of inter-related entities.

 

Data store view

Enterprise architecture is concerned with actors and activities that create and use stored data/information.

 

Information System View

TOGAF artefacts

ArchiMate viewpoints

Data store view

Conceptual Data diagram (Business Data Model)

Information Structure – conceptual level

Data Entity/Business Function matrix

 

Logical Data diagram

 

Application/Data matrix

 

Data Entity/Data Component catalog

 

Data Dissemination diagram

 

Data Security diagram

 

Data Lifecycle diagram

 

Data Security diagram

 

Data Lifecycle diagram

 

Migration view

Data Migration diagram

 

 

Define the core entities and events that the business must remember in order to complete its business processes and provide business services..

The word “core” usually implies data that is central to the conduct of business processes: e.g. Customer, Order, and Product Type..

This data is often duplicated in different data stores.

A Conceptual Data diagram (Business Data Model) may be draw to show relationships between the core entities.

Some find it more practical to simply list the core entities in a business data entity catalogue

Few if any attributes are specified in this kind of model, which is primarily used to identify data duplication.

 

Map data entities (in the diagram above) to business functions that create and use them.

You can cluster activities in a Data Entity/Business Function matrix, for example by data created.

The North West corner method sorts the rows and columns of a matrix by clustering them on a shared cell entry, such as “create”.

 

Define the entities and events that an application must remember in order to provide services to other applications and/or business users.

A Logical Data diagram details information to be stored, usually in one database and/or to enable one application.

A logical data model includes not only entities and relationships, but also each entity type’s primary key and other attributes.

Foreign keys may identify the relationship between different entities.

This model is usually normalised so as to minimise duplication of information.

It defines terms and concepts used in a particular business domain.

 

Map data entities to application that create and use them.

An Application/Data matrix can reveal overlaps between data maintained by different applications

 

Map data entities (in the diagram above) to data components that hold them.

Data Entity/Data Component catalog

 

Map data entities to applications that maintain them, or data components that hold them.

A Data Dissemination diagram shows there is duplication, look to define a data mastering policy (master and copy) for the baseline or target application portfolio.

 

Data Security diagram

 

Data Lifecycle diagram

 

Data Migration diagram

 

A physical data model specifies the schema to be used in a particular database. .

It may be denormalized to speed up storage or retrieval.

It may refer to features available in the chosen database management system.

 

Physical data store forms

Data architects are concerned with the forms of matter and energy in which information is stored.

In theory, data architects can design non-digital stores; in practice most focus on digital ones.

Digital data store forms are changing; magnetic disks are currently being replaced by flash storage.

But then, flash is optimised for the asymmetric use cases of mobile devices, where data is written few times and read many times.

So, if you want to find out what may replace flash memory try this:

http://www.computerweekly.com/feature/Whats-wrong-with-flash-storage-And-what-will-come-after

Architects have to research physical data storage forms as the need arises.

 

Data store schema standards

Data architects define the locations and contents of data stores.

TOGAF’s “physical data component” is a vendor/technology specific realisation of a logical data component.

It could be database, data warehouse, document store, web information server or transaction log.

It has a technology-specific data schema, designed to suit its purpose, for example:

·         Transactional database

·         Data warehouse

·         Document store

·         Big data store

 

This table maps a purely logical data model to some data store schema varieties.

Logical data component

Physical data component

Logical data model

CODASYL

database schema

Relational

database schema

XML schema

(footnote 2)

OData-compliant

web information server.

Entities

Records

Tables

Complex types

Entities

Attributes

Fields

Columns

Contained elements

Properties

Relationships

Address pointers

Foreign keys

Contained elements

Navigation properties

 

Logical data model

TOGAF’s “logical data component” is a logical definition of the data in a data store.

It can be documented as a logical data model; that is, an entity-attribute-relationship model (which can include gen-spec relationships).

Modelling an information/data model in this way can be done independent of computing altogether.

A logical information/data model is a purely logical declaration of business terms and concepts without consideration of any database schema.

 

You might use a UML tool to draw an information/data model, but to call it a class diagram/model is misleading.

UML class diagrams are for modelling objects that have behaviour.

 

OData - the data access protocol for a web-based information server.

A modern way to realise logical data components as physical data components is in XML schema, accessed using the OData protocol.

This provides a generic way to organize and describe the data structure of any remote data store as a logical data model.

 

·         An Entity Type (Customer, Employee, etc.) is a data structure type consisting of named and typed Properties and with a key.

·         An Entity is an instance of an Entity Type.

·         An Entity Key (CustomerId, OrderId etc.) is formed from a subset of Properties of the Entity Type.

·         An Association defines a relationship between instances of Entity Types (for example, Employee WorksFor Department).

·         An Association can be 1-to-1 or 1-to-many, uni-directional or bi-directional.

·         A Navigation Property is property of an Entity Type bound to a specific association, which can be used to refer to associations of an entity.

 

Microsoft and SAP now expose their data using the OData protocol.

Any client (even a human) can retrieve a logical entity-attribute-relationship model from a web data store using HTTP, then proceed invoke operations on it using HTTP.

The physical data structure of a remote data server is its own business.

All that matters to a client is that data server returns a logical data model in reply to a request saying "get meta data".

The client can then proceed to invoke create, read, update and delete operations on entities in that data model.

 

Centralised and distributed data storage

Data architects are much concerned with the distribution or duplication of data in different data stores.

 

The choice between hierarchy and anarchy is central to much discussion of sociology and politics.

It is closely related to the choice between centralisation and distribution, which appears also in business, software and data architecture.

“It is not hard to speculate about, if not realize, very large, very complex systems implementations, extending in scope and complexity to encompass an entire enterprise.” John Zachman, 1987

This might be interpreted to imply consolidation of an enterprise’s business data into one large database.

(And it appears SAP pursued this strategy for many years.)

 

A current fashion is to "distribute data management" as Martin Fowler puts it.

So-called “micro services” (better-called “micro apps”) are based on small data stores.

The idea is to integrate small information systems rather than consolidate them around one data store.

This has advantages and disadvantages, but whether data storage is centralised or distributed the vision of EA remains the same.

That vision is to integrate business activities through sharing of the data they create and uses.

Data flows – data in motion

Businesses have to move information from one place to another, between business actors and data stores.

Data architects are concerned with the capture and transport of information in data structures.

 

Data flow view

Enterprise architecture is concerned with actors and activities that send and receive data/information.

The view relates applications to data flows (which can include messages, files and reports) and to data components.

 

Information System View

TOGAF artefacts

ArchiMate viewpoints

Data flow view

Application Interaction matrix

Application Cooperation

Application Communication diagram

Application Cooperation

Interface catalog

 

 

Application Interaction matrix

Application Communication diagram

See “Applications Architecture”.

 

Interface catalog

Catalog the data flows that pass between applications, and between human roles and applications.

 

Human actors do convey much critical business information informally - in ad hoc speech, gestures and drawings.

But architects cannot model ad hoc information; they can only name messages that appear in regular communications.

Data architects can name messages created and used in regular business processes (e.g. enquiry, response, order, invoice, payment).

 

A Data Flow Catalogue (Interface Catalogue in TOGAF)

Functional

attributes

Flow name

Enquiry

Response

Order

Trigger

Enquiry

Source

Customer

Sales

Customer

Destination

Sales

Customer

Sales

Information

Unstructured

Unstructured

Order details (tbd)

Non-functional

attributes

Frequency

1,000/day

1,000/day

30/day

Volume

500K

Confidentiality

High

High

High

Integrity

Medium

Medium

High

Availability

24/7

09.00-18.00

24/7

Transport

mechanisms

Technology

Web

Telephone

Web

Protocol

HTTP

HTTPS

 

Data flow definers may name data groups and items in those data structures (e.g. From and To addresses in an email header).

They can name so-called “unstructured” data items to hold ad hoc information (e.g. the message in the body of an email).

 

Like many such illustrations, this table shows what could be documented rather than what most actually document.

But understanding what is possible in theory is a precursor to deciding what to do in practice.

 

Physical data flow forms

Data architects are concerned with the forms of matter and energy in which business actors convey information.

At the bottom-most level, physical forms include wires, microwaves and sound waves (human speech).

In theory, data architects can design non-digital data flows.

In practice, data architects mostly focus on business systems in which business information is to be digitised.

 

Data flow format standards

Data architects are concerned to ensure senders and receivers can create and read data structures.

There are many standard data flow formats, covering:

·         Digital audio data, image data, and video data:

·         Documentation and scripts

·         Geospatial data; vector and raster data

·         Qualitative data, textual

·         Quantitative tabular data, with or without metadata

Data architects have to research standard data formats as the need arises. See footnote 1 for more detail.

 

Semantic interoperability

The information found in the structure of data flow or data store is a matter of perspective.

So “semantic interoperability” is a major concern of enterprise data architecture.

Data architects work to ensure the creators and users of a data structure share the same understanding of its contents.

Business data can be structured according to many domain-specific languages – bespoke or standard.

So, data architects have to research standard “canonical data models” as the need arises.

 

Where are input and output data/information flows in TOGAF?

Business input and output flows are identified at the start of the Business Architecture phase B.

These flows can convey materials and/or information.

The B-to-C and B-to-B information content is conveyed through Human-Computer Interfaces and APIs, and in non-digital forms like paper.

The flows are documented in Business Service contracts in the Architecture Requirements Specification.

And may appear also in a Business Service/Function Catalogue and/or Process/Event/Control/Product Catalogue.

 

Application input and output flows are identified at the start of IS Architecture phase C

These A-to-B and A-to-A flows can convey information only.

The information content is conveyed through Human-Computer Interfaces and APIs.

The flows are documented in IS Service contracts in the Architecture Requirements Specification

And may appear also in an Interface Catalogue and/or Application Use Case Descriptions.

 

Where are input and output data/information flows in ArchiMate?

Architects are taught to define systems from out to in, starting with the input/output boundary.

They define the external view of a system - hiding details of internal behaviours and structures.

Then, they divide the system into layers and/or subsystems and define each in the same way.

 

Architects define each system and subsystem (building block or component) by defining its interface(s).

An interface is a collection of services that a system or subsystem makes available to clients.

An interface encapsulates the internal actors/components and processes that implement or realise services.

 

The ArchiMate modelling language classifies these ideas as shown in the table below.

ArchiMate

Behaviour elements

Active structure elements

External view

Services

Interfaces

Internal view

Processes

Actors/Components

 

Services are discrete behaviours that clients can request of a system.

Service contracts encapsulate (hide) the necessary internal process flows and actors/components

Services consume and produce input/output flows that contain data and/or materials.

So, input and output data flows can be named (and detailed if need be) in service contracts.

Footnotes

Footnote 1: Data flow format standards

The table below shows a selection of formats from the list at https://library.uoregon.edu/datamanagement/fileformats.html.

It is drawn from UK Data Archive documentation; some of the data formats may be receding into history.

Popular modern formats include JSON for data flows, and OData for the description of web-accessible data stores.

 

Digital image data

TIFF version 6 uncompressed (.tif)

JPEG (.jpeg, .jpg)

PDF (.pdf)

Digital video data:

MPEG-4 High Profile (.mp4)

JPEG 2000 (.mj2)

Digital audio data

Free Lossless Audio Codec (FLAC) (.flac)

Waveform Audio Format (WAV) (.wav)

MPEG-1 Audio Layer 3 (.mp3)

Qualitative data, textual

eXtensible Mark-up Language (XML) text according to a Document Type Definition (DTD) or schema (.xml)

Rich Text Format (.rtf)

plain text data, ASCII (.txt)

Hypertext Mark-up Language (HTML) (.html)

widely-used proprietary formats, e.g. MS Word (.doc/.docx)

Documentation and scripts

Open Document Text (.odt)

Rich Text Format (.rtf)

HTML (.htm, .html)

plain text (.txt)

widely-used proprietary formats, e.g. MS Word (.doc/.docx) or MS Excel (.xls/ .xlsx)

XML marked-up text (.xml) to a DTD or schema, e.g. XHMTL 1.0

PDF (.pdf)

Quantitative tabular data with extensive metadata

SPSS portable format (.por)

delimited text and command ('setup') file (SPSS, Stata, SAS, etc.) containing metadata information

structured text or mark-up file containing metadata information, e.g. DDI XML file

MS Access (.mdb/.accdb)

Geospatial data; vector and raster data

ESRI Shapefile (essential -- .shp,.shx, .dbf;

optional -- .prj, .sbx, .sbn)

geo-referenced TIFF (.tif, .tfw)

CAD data (.dwg)

tabular GIS attribute data

Quantitative tabular data with minimal metadata:

comma-separated values (CSV) file (.csv)

tab-delimited file (.tab) including delimited text of given character set with SQL data definition statements where appropriate

delimited text of given character set -- only characters not present in the data should be used as delimiters (.txt)

widely-used formats, e.g. MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dBase (.dbf) and OpenDocument Spreadsheet (.ods)

 

Footnote 2: Logical data model to physical XML schema mappings

The table below is edited from a table on the IBM web site.

http://www.ibm.com/support/knowledgecenter/SS9UM9_8.1.0/com.ibm.datatools.transform.ldm.xsd.doc/topics/rldm2xsd_map.html

 

Logical data model

Physical XML schema

Schema

SchemaLocation (XSD file name)

TargetNamespace (unless set in the Properties page)

Atomic Domain

Simple Type

Atomic Domain - Name

Name

Domain Constraint

Facet (FractionDigits, TotalDigits, MaxLength, MinLength, Length

MaxExclusive, MinExclusive, MaxInclusive, MinInclusive, Enumeration, Pattern)

Entity

Complex Type and Element

Entity - Name

Name

Entity - Documentation

Documentation

Entity - Supertype of Generalization

BaseType of Complex Type

Entity - Primary Key

Key of Element

Generalization

See Entity

Generalization Set

See Entity (with all applicable properties of the generalization set).

Attribute

Contained Element with Simple Type

Attribute - Name

Name

Attribute - Documentation

Documentation

Attribute - Data Type, Length/Precision, Scale

Type

Attribute - Primary Key

Key field of containing Element

Attribute - Entity

Owning Complex Type

Relationship

Contained Element with Complex Type

RelationshipEnd

Contained Element with Complex Type

RelationshipEnd - VerbPhrase

Name

RelationshipEnd - Cardinality

MinOccurs / MaxOccurs