Generic data model

Generic data models are generalizations of conventional data models. They define standardised general relation types, together with the kinds of things that may be related by such a relation type.

Overview

The definition of generic data model is similar to the definition of a natural language. For example, a generic data model may define relation types such as a 'classification relation', being a binary relation between an individual thing and a kind of thing (a class) and a 'part-whole relation', being a binary relation between two things, one with the role of part, the other with the role of whole, regardless the kind of things that are related. Given an extensible list of classes, this allows the classification of any individual thing and to specify part-whole relations for any individual object. By standardisation of an extensible list of relation types, a generic data model enables the expression of an unlimited number of kinds of facts and will approach the capabilities of natural languages. Conventional data models, on the other hand, have a fixed and limited domain scope, because the instantiation (usage) of such a model only allows expressions of kinds of facts that are predefined in the model.

History

Generic data models are developed as an approach to solve some shortcomings of conventional data models. For example, different modelers usually produce different conventional data models of the same domain. This can lead to difficulty in bringing the models of different people together and is an obstacle for data exchange and data integration. Invariably, however, this difference is attributable to different levels of abstraction in the models and differences in the kinds of facts that can be instantiated (the semantic expression capabilities of the models). The modelers need to communicate and agree on certain elements which are to be rendered more concretely, in order to make the differences less significant.

Generic data model topics

Generic patterns

There are generic patterns that can be used to advantage for modeling business. These include entity types for PARTY (with included PERSON and ORGANIZATION), PRODUCT TYPE, PRODUCT INSTANCE, ACTIVITY TYPE, ACTIVITY INSTANCE, CONTRACT, GEOGRAPHIC AREA, and SITE. A model which explicitly includes versions of these entity classes will be both reasonably robust and reasonably easy to understand.

More abstract models are suitable for general purpose tools, and consist of variations on THING and THING TYPE, with all actual data being instances of these. Such abstract models are on one hand more difficult to manage, since they are not very expressive of real world things, but on the other hand they have a much wider applicability, especially if they are accompanied by a standardised dictionary. More concrete and specific data models will risk having to change as the scope or environment changes.

Approach to generic data modeling

One approach to generic data modeling has the following characteristics:

A generic data model shall consist of generic entity types, such as 'individual thing', 'class', 'relationship', and possibly a number of their subtypes.
Every individual thing is an instance of a generic entity called 'individual thing' or one of its subtypes.
Every individual thing is explicitly classified by a kind of thing ('class') using an explicit classification relationship.
The classes used for that classification are separately defined as standard instances of the entity 'class' or one of its subtypes, such as 'class of relationship'. These standard classes are usually called 'reference data'. This means that domain specific knowledge is captured in those standard instances and not as entity types. For example, concepts such as car, wheel, building, ship, and also temperature, length, etc. are standard instances. But also standard types of relationship, such as 'is composed of' and 'is involved in' can be defined as standard instances.

This way of modeling allows the addition of standard classes and standard relation types as data (instances), which makes the data model flexible and prevents data model changes when the scope of the application changes.

Generic data model rules

A generic data model obeys the following rules^[2]]:

Candidate attributes are treated as representing relationships to other entity types.
Entity types are represented, and are named after, the underlying nature of a thing, not the role it plays in a particular context. Entity types are chosen. Thus as a result of this principle, any occurrence of an entity type will belong to it from the time it is created to the time it is destroyed, not just whilst it is of interest. This is important when managing the underlying data, rather than the views on it used by applications. We call entity types that conform to this principle generic entity types.
Entities have a local identifier within a database or exchange file. These should be artificial and managed to be unique. Relationships are not used as part of the local identifier.
Activities, relationships and event-effects are represented by entity types (not attributes).
Entity types are part of a sub-type/super-type hierarchy of entity types, in order to define a universal context for the model. As types of relationships are also entity types, they are also arranged in a sub-type/super-type hierarchy of types of relationship.
Types of relationships are defined on a high (generic) level, being the highest level where the type of relationship is still valid. For example, a composition relationship (indicated by the phrase: 'is composed of') is defined as a relationship between an 'individual thing' and another 'individual thing' (and not just between e.g. an order and an order line). This generic level means that the type of relation may in principle be applied between any individual thing and any other individual thing. Additional constraints are defined in the 'reference data', being standard instances of relationships between kinds of things.

Examples

Examples of generic data models are

ISO 10303-221,
ISO 15926 and
Gellish or Gellish English.
Found in Data Model Patterns: Conventions of Thought by David C. Hay. 1995
Found in Enterprise Model Patterns: Describing the World by David C. Hay. 2011

References

^ Amnon Shabo (2006). Clinical genomics data standards for pharmacogenetics and pharmacogenomics Archived 2011-10-18 at the Wayback Machine.
^ Matthew West and Julian Fowler (1999). Developing High Quality Data Models Archived 2020-09-09 at the Wayback Machine. The European Process Industries STEP Technical Liaison Executive (EPISTLE).

1. David C. Hay. 1995. Data Model Patterns: Conventions of Thought. (New York: Dorset House).

2. David C. Hay. 2011. Enterprise Model Patterns: Describing the World. (Bradley Beach,New Jersey: Technics Publications).

3. Matthew West 2011. Developing High Quality Data Models (Morgan Kaufmann)

External links

Data Flow Diagram
Gellish English and the Gellish Dictionary and documents about Gellish [1]

[1] Amnon Shabo (2006). Clinical genomics data standards for pharmacogenetics and pharmacogenomics Archived 2011-10-18 at the Wayback Machine.

[MW99-2] Matthew West and Julian Fowler (1999). Developing High Quality Data Models Archived 2020-09-09 at the Wayback Machine. The European Process Industries STEP Technical Liaison Executive (EPISTLE).

[1]

[2]