Return to: JSW Home Page.
Send message to: Dr. T.William Olle or nell@nist.gov, (Jim Nell) workshop convener and secretary.

Fundamentals of Data and Process Modelling

Dr. T.William Olle
2 Ashley Park Road
Walton on Thames
Surrey KT12 1JU, England
Phone +44-1932-221224; Fax +44-1932-221227
email 100010.3176@CompuServe.com

PAPER SUBMITTED FOR PRESENTATION JTC1 WORKSHOP IN SEATTLE, 9-12 SEPTEMBER 1996

28 May 1996

ABSTRACT

This paper addresses both data modelling and process modelling with the emphasis on the former. The paper notesthat there are numerous approaches to data modelling and to process modelling in widespread use, the differences between alternative approaches being firstly the representation form used and secondly the extra kinds of constraints expressible with more advanced approaches to data modelling.

The concept of a representation form is elaborated as being not merely diagramming techniques, but also including formalized languages such as SQL92 and Express, as well as the more mathematically oriented forms such as predicate logic.

The paper attempts to establish that the principles underlying most if not all approaches to data modelling are basically the same and gives a taxonomy for these principles illustrating how selected types of constraint (the more simple of which are usually thought of as relationships) can be handled in different representation forms.

The paper distinguishes carefully between analytic modelling (modelling what is there in what is variously referred to as a business area, a problem area and a universe of discourse) and prescriptive modelling (modelling what is needed in a problem area).

A three point spectrum (enterprise, computer human interface and computer system) is identified, in each part of which modelling has a role to play.

The paper discusses typical uses of modelling such as information systems methodologies, enterprise modelling, conceptual modelling, and data base design. It is suggested that the data/process modelling approach is actually a very fundamental discipline which could profitably be applied far more widely than is currently the case. Examples of such possible applications are in the analysis of and consensus building for complex situations, and in the definition of standards typically written in narrative prose (such as quality assurance).

Finally, the paper addresses the following four questions of relevance to the workshop and to international standards. Firstly, what international standards are needed for data and process modelling? Secondly, how can a modelling approach be used in formally defining other international standard? Thirdly, how can a modelling approach mitigate the problem of products based on standards prepared at different times by different groups. Fourthly, how can a modelling approach assist in meeting the main ISO goal for standards of achieving interoperability?

1. Basic concepts in data modelling

Data modelling is a well established pseudo-discipline. Most people in the information technology field are familiar with some kind of data structure diagram (sometimes referred to more specifically as entity relationship diagrams, or by the name of some particular variant such as SSADM, IDEF1X, or NIAM). There are innumerable software products each of which supports the preparation of one or more variants of data modelling and sometimes additionally some other kind of modelling, such as data flow diagrams.

The term "data modelling" is usually interpreted as implying a diagramming technique. There are many such diagramming techniques in use worldwide. A recent analysis [1] listed some 14 alternative ways of representing the most basic and widely used data modelling construct (namely the one to many relationship to be discussed later in this paper).

2. Representation forms

To regard data modelling as diagramming is to take a narrow view of what it is all about. It is proposed in this paper that one should distinguish the concept of data modelling from the concept of a representation form for a data model. Since there are so many alternative diagrammatic approaches, it is useful to refer to one of the categories of representation form as "diagrammatic".

Another class of representation form is the language based form where an alphabet, based on a standard character set and a set of reserved words, is used to define the data model declaratively. A data model is clearly not a computational procedure. It is a set of declarations or assertions about the data which the data is required to satisfy. This category of representation form is referred to in this paper as "language based" (even though some would argue that all representation forms are based on some form of language and that diagrams for example are also are a form of language).

Two other possible representation forms are mathematically based form of predicate logic and any kind of narrative prose.

A given data model can be represented using any or all of these categories of representation form. Each category has its advantages and disadvantages. Diagramming builds on the old adage that a "picture is worth a thousand words". However, as will be established later in this paper, diagramming suffers from an inherent lack of expressive power. The more one tries to overcome this shortcoming, the more one erodes the advantages which diagramming techniques offer as a means of conveying the principle concepts quickly in an assimilatable form.

Predicate logic attracts the formalists but has the disadvantage that a data model expressed is not easily assimilated by the everyday practitioner. There is a need for a representation form comprehensible to "subject area experts", namely those who are experts in the area being modelled but who are not necessarily trained in the skill of data modelling. (Data modelling is analogous to music. A rare few can compose music. Some can play the music. Some can play it. Some merely appreciate it! )

3 Data modelling and its origins

Data modelling is a general purpose technique which can be applied to many problem areas (or subject areas or business areas or even Universes of Discourse). Data modelling is derived unquestionably from the early use of database technology [2]. The need for modelling data and indeed the need for designing databases was recognized as early as about 1963, which was some years after the recognition of the need for programming languages. As a basic skill which should be taught to all as one of the basic aspects of what is now called information technology, data modelling has never quite caught up with programming.

One interesting similarity with programming languages is the following. It used to be argued that a programming language was not complete unless it was possible to write a compiler for that language in itself. With data modelling facilities, it is important to note that one can apply a data modelling technique to the definition of a data modelling technique. One could argue that a data modelling technique is not complete unless it can be applied to defining itself.

4 Data model - a source of confusion

The term "data model" has been the source of confusion. It is most widely used to refer to a model for a specific business area (order processing, insurance claims, airline seat reservations) prepared using to a data modelling technique. Unfortunately, the term "data model" was hijacked in the early seventies and used in the sense of "the network data model, the relational data model and the hierarchical data model".

This use of the term has been widely taught and causes confusion whenever one is in a group which needs to reference both interpretations. The problem was avoided in the Reference Model of Data Management [3] by using the terms data modelling facility and application data model. These terms are used in this paper in the following sense. A data modelling facility may be used to develop an application data model for a subject area. If the subject area happens to be data modelling, then the data modelling facility is being used to model a data modelling facility. For completeness, data modelling is the activity of producing an application data model using a data modelling facility.

5 Positioning data modelling in a three part spectrum

Data modelling is a versatile tool and can be used in a number of ways. To start with, it is useful to identify the following three part spectrum:

The enterprise is what is sometimes referred to as the "real world". It may be an insurance company or a bank. It may be a university department or a government agency. It may be a design of an automobile or ship. As indicated earlier, it may even be a modelling facility.

The computer human interface (CHI) is the interface between the enterprise and the computerized information system. It is where the user, who is part of the enterprise, is able to see what the computerized information system is able to do for the user. In this day and age, in the world of "point and click", the user has a view of a computerized information system which is very different from the view of the computerized information system itself.

The computerized information system is the term used here for what used to be called the "internals" of the software. The qualifier "computerized" is used to emphasize that the concept of "information system" should not be limited to the computerized part. Designers of an information system are negligent in their task if they do not consider the manual procedures which have to be defined and followed in the user environment.

Data modelling can be used at any of these three points in the spectrum. The choice of representation form exists but the data model would typically need to be more precise for the computerized information system than for the enterprise. There are problems of emphasis at each point in the spectrum. A data model of an enterprise is usually, but not necessarily, analytic. In other words, it represents an interpretation of what data the enterprise uses rather than a prescription for what it should use. This distinction between the analytic and the prescriptive is an important one. It does not call for a different data modelling facility to be used, but rather for a different way of thinking (in German, a "weltanschauung") for the person or team carrying out the modelling.

For the computer human interface and the computerized information system, there are different data modelling problems and there is a different distinction which it is useful to make. When designing a computerized information system, it is useful to start by factoring out all aspects of construction and performance. This means that the designer produces a specification of what data the computerized information system will support and what tasks a user of the system needs to be able to "point and click" on when using the system. This specification is essentially the view of the data and the associated user tasks at the computer human interface. The role of an application data model at the CHI is indeed prescriptive but it should also be independent of which database management system (DBMS) will be used (if any) in the construction of the system.

For the computerized information system, one has to take into account the construction tool to be used and the performance considerations. Database design with one vendor's DBMS is still different from database design with another vendor's DBMS (even if both claim, for example, to be SQL conformant).

The enterprise data model is usually expressed in the form of a diagram. There is a growing recognition that an enterprise model should capture all of the "business rules". Many (but never all) of these business rules can be expressed in a data model. Many business rules are "procedural" and need to be expressed in some kind of process model. Diagram techniques are in any case limited in their ability to represent business rules.

The application data model for a computerized information system is usually represented in some kind of language based representation form (such as SQL). It is vital that this representation of the application data model is relatable to (preferably derivable from or even generatable from) the enterprise data model for the enterprise to be supported by the computerized information system.

6 Snapshot modelling, dynamic modelling and business rules

The most common kind of data modelling is unfortunately snapshot modelling. This means that an application data model is a snapshot of the data in an enterprise (or in a computerized information system) at a specific point in time. This is traditional way of thinking based on the way of thinking about databases which was all that was technologically possible and feasible in the early days of database.

A dynamic data model is an application data model which is capable of representing the data in a business area at any point in a time continuum (including the future). Reservation systems ( for hotel rooms, flights, car hire) represent a class of systems for which dynamic data modelling is needed. If such a system is designed and built on the basis of a snapshot model, then the obvious dynamics of the business area have to be captured and represented elsewhere - typically in the application programs.

The need to capture "business rules" is widely accepted. Representing these rules in application programs is the traditional way - typically because database management systems have not allowed any other approach. Designers have therefore been trained to use a DBMS to do what it can do, namely transfer data between high speed memory and disc. Representation and subsequent enforcement of business rules is something which therefore has to be built into application programs.

The problem with embedding rules in application programs is as follows. As and when these business rules change, then changing the application programs is time consuming and expensive. One solution to this problem is to express the business rules declaratively - not only in the enterprise modelling, but also in the computerized information system.

7 Data modelling terminology

Each data modelling facility has a degree of expressiveness, some being more limited than others. As indicated earlier, diagrammatic techniques are, by their very nature, more limited in this respect than language based approaches and more formalized categories of representation.

In order to have some basis for comparison, it is useful to introduce the concept of a data modelling construct as a "unit of functionality" which a data modelling facility either supports or does not support.

It is convenient to partition the of data modelling into constraint classes. This taxonomy is predicated on the widespread recognition and acceptance in data modelling of the "type instance" dichotomy and also the distinction between two classes of concept as illustrated in the following table:

Alpha conceptBeta concept
record typefield
entity typeattribute
tablecolumn
non-lexical object typelexical object type

Each of these pairs has its supporters and many other examples could be cited. The feature that all have in common is that each beta concept relates to one corresponding alfa concept each alpha concept is related to one or more beta concepts. (This assertion is often refuted by those who would claim , for example, that the same attribute can belong to two or more different attribute type. It all depends how one interprets the beta concepts.).

As far as establishing the principles of data modelling are concerned, each of these pairs is equally useful. The choice of a pair to use in the rest of the paper (in preference to the abstract and obscure terms "alpha concept" and "beta concept") is arbitrary. The traditional terms "entity type" and "attribute" are therefore selected.

8 Relationship and constraints

Another term widely used in data modelling is the traditional (that is "long standing") term "relationship". When it was first recognized that one could establish a relationship between the entity type SUPPLIER and the entity type PURCHASE ORDER, the thinking (in the early days of database technology) was that the advantage of this relationship was that it made it easy to access any of the purchase orders sent to a given supplier.

While this advantage still holds, the relational way of thinking has brought out the recognition that the main advantage is that when adding new PURCHASE ORDERS into a system, the existence of a relationship with SUPPLIER ensured that the PURCHASE ORDER would in fact be associated with a SUPPLIER which is "known to the system".

To say that a PURCHASE ORDER must be associated with a SUPPLIER is an example of citing a business rule. It can in addition be a business rule that a SUPPLIER is only of interest to the enterprise if it has been sent at least one PURCHASE ORDER. This is a subtly different rule from that which allows a SUPPLIER with no PURCHASE ORDERS to exist. Both scenarios are perfectly feasible. It depends on the way the enterprise does business. This is a simple example of the kind of analysis which is inherent in enterprise data modelling.

The reason for this discussion is to emphasize that the important aspect of the concept of "relationship" has rightly changed from the early days of database. Aspects of relationships which offer the potential for improved performance should not be of interest to data modellers. Aspects of relationships which represent business rules are very definitely of interest to data modellers.

The term "constraint" can be introduced at this point as a way of thinking which is to be preferred to that of merely considering "relationships". The term "constraint" is much broader than the term "relationship". Relationships have established their useful role in the less formal and less precise approaches to data modelling, such as those carried out exclusively with diagramming techniques. As indicated above, a relationship typically implies one or two constraints, such as the following two examples:

both of which are constraints. As a pair, these are an example of a "one to one or more" relationship.

To assert that--each SUPPLIER may have zero, one or more PURCHASE ORDERS--is an example of an assertion which is non-constraining. It is always true and it does not help to record the assertion (except to note that there is a need to be able to locate all the PURCHASE ORDERS associated with a SUPPLIER).

If the business rule states that it is important to know about SUPPLIERS whether or not they have been sent a PURCHASE ORDER, then the constraint--each SUPPLIER must have at least one PURCHASE ORDER--would not be valid and it is necessary to replace it with the non-constraining assertion--each SUPPLIER may have zero, one or more PURCHASE ORDERS.

The other assertion, namely--each PURCHASE ORDER is related to one and only one SUPPLIER--is the same in both cases, even though the kind of relationship between SUPPLIER and PURCHASE ORDER is different, namely a one-to-zero, one-or-more relationship.

9 Kinds of relationship

There are three different classes of relationship as follows:

A unary relationship involves only one entity type. A binary relationship involves two entity types. An N-ary relationship involves three or more entity types. Opinions vary about the value and merit of N-ary relationships. They will not be considered further in this paper.

Binary relationships are the workhorses of data modelling and it is useful to identify ten different classes of binary relationship as follows:

  1. One to many, where many means zero, one or more
  2. Zero or one to many, where many means zero, one or more
  3. One to many , where many means one or more
  4. Zero or one to one
  5. Zero or one to zero or one
  6. Zero or one to many, where many means one or more
  7. Many (one or more) to many (one or more)
  8. Many (zero, one or more) to many (where many means one or more)
  9. One to one
  10. Many (zero, one or more) to many (zero, one or more)

The sequence here is based on an evaluation of usefulness. Number 10 is last because it does not represent any kind of constraint in either direction and is therefore always true. Number 9 is of possible value in a snapshot model but if very questionable value in a dynamic model. Users of one to one relationships are usually involved in the preparation of a snapshot data model and admit that they are actually mean to use number 4 in the above list.

Numbers 6,7 and 8 have also been found to be of limited value. This leaves numbers 1 to 5 as being useful in practice with number 1 being the mainstay of data modelling of any kind. Numbers 1 and 3 are illustrated early in this paper in terms of the alternative possible relationships between SUPPLIER and PURCHASE ORDER.

A unary relationship is a relationship which involves one entity type. It expresses a relationship between some entities of an entity type and other entities of the same entity type. The most common example is the unary equivalent of binary relationship class 2, which can be used to model a homogeneous tree.

10 Categorization schemes for binary relationships

Designers of data modelling facilities choose to categorize the different kinds of relationships in different ways. For example, it is quite common to see a breakdown into the following three categories

If the categorization scheme goes on to distinguish optionality and mandatory, then the net result is the same as the breakdown into 10 shown in section 9. If the categorization scheme does not make this distinction, then the data modelling technique based on this categorization scheme will of necessity be superficial in terms of capturing semantic constraints.

11 Exclusivity constraints

Relationships are very important in diagrammatic modelling techniques. Another kind of constraint which is often included in diagrammatic representations is exclusivity constraint. Again there are many classes of exclusivity constraints. They can also be thought of as constraints on constraints.

The following is an example of an exclusivity constraint. Both ORGANIZATION UNITS and EXTERNAL UNITS each perform zero, one or more ACTIVITIES. However, an ACTIVITY may be performed either by an ORGANIZATION UNIT or by an EXTERNAL UNIT, by neither, but explicitly not by both. (This means that the entity type ACTIVITY refers to a single "time stampable" occurrence of a performance. The data model is not concerned with the ability of an ORGANIZATION UNIT or an EXTERNAL UNIT to perform ACTIVITIES, but with representing the fact that they have done so or are planned to do so. The data model must reflect the event of an activity being performed or being planned, and not the ability of either an EXTERNAL UNIT or an ORGANIZATION UNIT to perform an activity.)

The entity type ACTIVITY is constrained first by the fact that it may relate to no more than one EXTERNAL UNIT. An ACTIVITY may also relate to no more than one ORGANIZATION UNIT. However, if it relates to one of these two, it may not relate to the other and vice versa. In other words, the two constraints on ACTIVITY are mutually exclusive. This is a fairly common kind of exclusivity constraint but there are several other kinds.

12 Constraints on attributes

There constraints on attributes which can be the subject of some debate concerning their role in data modelling. For instance, there is the question of data types. Many data modellers would argue that specifying that an attribute is numeric or character is not part of data modelling. Certainly this is not the kind of constraint which one would seek to capture in a diagrammatic representation form but, in a more fully specified data model, this kind of information is needed. However, if an attribute is a latitude or a longitude, then this must be seen as a business rule and there is an argument for capturing that fact as soon as possible.

The fact that an attribute may be allowed to have a null value is another topic for debate when trying to define data modelling as a skill distinct from database design. To refer back to the preceding sections 8 and 9 of this paper, nulls come into the picture very quickly when considering relationships such as binary relationship class 2 in section 9.

One can define a relationship of this class between PURCHASE ORDER and PRICE QUOTATION which would be interpreted (in the style of section 8) as follows:

The first of these two assertions is constraining, but not the second.

Regarding the first assertion, it is clear that if a PURCHASE ORDER is indeed based on (and therefore related to a PRICE QUOTATION), then it is necessary to know which PRICE QUOTATION.

Some data modelling facilities include attributes in their purview; others do not. Sometimes they may included in diagrams; sometimes not. Whether or not the inclusion of attributes is regarded as part of what is designated as a data modelling facility is not the issue here. At some point in the overall information systems life cycle process, it will be necessary to decide on the attributes. The question at issue is - how does one designate the value of an attribute such as "Price Quotation Number" for a PURCHASE ORDER which is not based on a PRICE QUOTATION.

There are a number of approaches to this question. One is for the system designer to choose a special representation of such values. Another is to accept the general concept of "nulls".

In the second case, the problem is created that sometimes it is necessary to assert for an attribute that it may be null and sometimes it is necessary to assert that it may not be null. If nulls are generally allowed, then the constraining assertion is that nulls are not allowed for an attribute.

13 Uniqueness constraints, keys and identifiers

The need for what have traditionally been called by names such as keys and identifiers is well established in modelling practice. As soon as one starts to designate the attributes of an entity type, it is useful to pick out one or more attributes to serve as the identifier or key of the entities of a given entity type.

There is some confusion among the three concepts -

To designate a uniqueness constraint on a set of one or more attributes is to make a rule which says that each set of values for the set of attributes is "different". (One has to be careful about the scope of the difference.) This uniqueness constraint may or may not reflect a business rule. Such a rule may be that the values of the two attributes "department number" and "employee number" must be unique within the company.

To designate that a set of attributes is a "key" typically implies that these attributes can be used to access a record (row, entity, tuple or whatever). Designating keys is a typical activity associated with improving the performance of a computerized system and hence not included in the overall activity of modelling.

To designate that a set of attributes is an "identifier" is to designate a label which can be used to distinguish one row in a table (or record in a file or entity of a given entity type) from another. The term "identifier" is not quite as "hard" as "key" and one does not feel tempted to think in terms of access mechanisms as one does with the term "key".

On the modelling level, the most preferable term appears to be "uniqueness constraint". For a given collection of records (rows, entities or tuples), there may be one or more uniqueness constraints. It is one of the more basic tenets of relational theory that there should be at least one for each table (which is one of the main reasons why the ISO SQL standard does not claim to adhere to relational theory.)

14 Constraints of potentially infinite Boolean complexity

This paper has deliberately segwayed from talking about relationships as a fundamental modelling concept to talking about constraints. The association between the two can be summarized as follows:

It is important to note that the categorization of various types of constraint discussed so far in this paper is not complete. The categories discussed are as follows:

It is category C (see section 11) which needs to be considered further. Exclusivity constraints have been developed in the context of diagrammatic representations. In retrospect, they can be seen as a means of capturing a few more kinds of constraints in diagrammatic form. In practice they represent no more than the tip of an iceberg.

When one analyzes the kinds of business rules which exist in practice, and attempts to model these rules using the conventions supported in so many diagramming techniques, the shortcomings of diagramming techniques become apparent.

Business rules exist (and may be formulated) which contain multiple levels of boolean complexity. There is not a lot of experience available with representing such rules declaratively as the typical approach taken in practice is to embed such rules in procedural code. This approach has been taken because at the system design and construction level they has been no alternative.

15 The 100% principle

The discussion in the previous section leads to a major enigma in data and process modelling. Where is the borderline between data and process modelling and system design and construction. The famous "orange book" of 1982 [4] introduced what it referred to as the 100% principle. The principle stated the following:

All relevant static and dynamic aspects, i.e. all rules and laws, etc. of the universe of discourse should be described in the conceptual schema. The information system cannot and should not be held responsible for not meeting those described elsewhere, including in particular those in application programs.

The clarification in the second sentence is clearer than the statement of the principle in the first sentence. The 100% principle has been paraphrased [5] as follows:

The business rules applicable to a given subject area must be specified declaratively. The place where they are captured is called a conceptual schema. The business rules should not be embedded in application programs.

Whether or not one refers to the collective representation of the business rules as a "conceptual schema" is merely a terminological issue. The term has been adopted for many uses less precise and less clearly focussed than that included in [4] and the very fact of such adoption has lead inevitably to the corruption of the concept.

The underlying principle, however, as paraphrased above is felt to be important in the context of data and process modelling. The underlying principle is concerned with the declarative approach to representing business rules as opposed to embedding such rules in procedural language code.

The preference for the declarative approach is predicated entirely on the ever present requirement for "designing for change" (a catch phrase of the sixties). Businesses are always changing and business rules will always need to change to reflect the changes in business. Systems should be designed and constructed in such a way that even unpredictable kinds of change can be accommodated.

16 Uses for data modelling

The most obvious and typical use for a data modelling technique is during the information system life cycle. There are many different interpretations of the life cycle concept and indeed it is often referred to by some other name - such as system development life cycle. One can argue that the introduction of the term "development" is unnecessarily constraining.

One example of a possible stage breakdown for an information system life cycle is based on that given in [1] as follows:

Data modelling can be used, albeit in different ways, at each and every one of these stages. At the planning stage, a data model need not be rigorous. At the system design stage, it must be "complete" in the sense that all business rules that can be captured need to be captured. (There are usually some that need to be captured in a process model.)

A data model developed during the analysis stage needs to be an analytic data model (see section 5 of this paper). In some senses this stage is difficult to put into context in terms of completeness. Typically, the analysis of a business area is influenced strongly by the use of some legacy system which is due to be replaced. How much human resource should be invested in analyzing the old way of doing things at a time when the emphasis should be on "getting things right" this time around?

Apart from the use of data modelling in connection with computerized information systems, the basic technique is valuable in a more general context. Given a complex situation which involves a number of inter-related concepts on which a group of people are trying to agree, the preparation of an analytic data model showing how these concepts seem to be related is a good basis for reaching consensus on how they should be related.

17 Process modelling

Although this paper has concentrated so far on data modelling, the importance of process modelling cannot be overstated. The symbiosis between data modelling and process modelling is difficult to formalize. One has a very different view of process modelling if one regards it as a starting point from the view one has if one regards it as an activity which builds on data modelling.

To review the implications of process modelling, it is useful to restate the three part spectrum from section 5 of this paper:

Early work on process modelling was strongly influenced by the requirements in part C of this spectrum. The computerized system was being designed to perform certain processes (or procedures) and it was necessary to identify the processes and how they were inter-related. The fact that each process needed input data and in many cases generated output data was to a large extent secondary.

The need to be able to model the activities being performed in an enterprise as a necessary preamble to designing a computerized system came much much later. The recognition that the way a process (or activity) was performed in an enterprise was not necessarily the best way for that process to be performed in a computer was also somewhat slow to be accepted.

The fact that most computerized systems nowadays provide an interface to the users through a screen and a keyboard means that a system designer has to give major consideration to the processes which can be initiated by a user at a terminal.

The perceptions of a process as performed in the enterprise, the process initiated at a terminal (in the world of point and click) and a process as performed inside a computerized system are all different. While the three perceptions clearly need to be related, the packaging of processes in each case is governed by very different requirements.

When performing an enterprise activity analysis (by whatever name one calls it), it is advisable not to limit the analysis to activities which clearly need to be computerized. "Personnel management" is a typical high level activity in most enterprises. Some parts of it -such as payroll may be completely computerized, while other parts such as recruiting may be computer aided and other parts such as personnel evaluation may be computerized to the extent that scheduling and recording are computerized.

When specifying the processes or tasks which may be initiated by a user of the system at a terminal, one has to take into account the users work patterns which often involve non-computerized or even uncomputerizable tasks. One has to evaluate the trade-off between many small simple tasks and fewer more complex tasks. Packaging and partitioning of such tasks is the major problem in this part of the spectrum.

When carrying out the life cycle stage referred to in the previous section as "construction design" , the factors influencing the process modelling are very different from those in the other parts of the three part spectrum. Typically, the decision will be influenced by the approach to construction which has been selected. Even for a given approach, the decision may be influenced by the specific construction tool which has been chosen (or imposed).

One must be careful when trying to articulate the differences among the three parts of the three part spectrum as regards process modeling. To suggest that such differences exist is basically to suggest that the packaging of the processes and apportionment of the processes to packages is different for each part of the spectrum. The same techniques may or may not be used for defining these processes

18 Data modelling in standards

As discussed earlier in this paper, a data modelling facility is a means of specifying the semantics of a collection of inter-related data in some problem area. The problem area may be in one of the traditional industry domains such as finance, manufacturing. An application data model is a model of the data in a problem area. An application data model may have several representation forms. However, an application data model must be defined using a data modelling facility.

Information technology standards in the ISO JTC1 context have traditionally been concerned with technology which is independent of traditional industry domains. One could safely assert that any standardization of data modelling facilities and their associated representation forms should be carried out as part of ISO JTC1. One can also assert that the development of application data models is a task for the Technical Committee concerned with a given industry domain.

The question of representation forms is more difficult. This paper has tried to emphasize the distinction between a data modelling facility and possible associated representation forms. It has also tried to emphasize the role and limitations of graphic representation forms.

The most widely used data modelling facility and associated representation form is that supported in the series of ISO SQL standards [6,7,8]. It is a widespread practice in the industry for CASE tools to be used to iterate through successive diagrammatic representations of a data model and for such CASE tools to be able to generate some form of SQL syntax once the iterations have been completed.

The power of SQL as a data modelling facility tends to be viewed in terms of the power available in the earlier versions. In fact the modelling power supported in what is generally referred to as SQL92 is a considerable advance on that in earlier versions. Those who feel that data base design is concerned with defining how data is represented in storage tend to feel that database design and data modelling should be kept far apart. In fact, database design as supported in the relational model and indeed in SQL are concerned with a declarative representation of the semantics of the data.

The next version of SQL which is currently under development will include capabilities for handling user defined data types (somewhat misleadingly referred to in many circles as "abstract" data types). This will enhance the ability to tailor SQL for use in specific application domains which require special types of data such as latitudes and longitudes, azimuths, and so forth.

There are fortunately only very few people who wish to initiate the development of an application data model by means of writing SQL syntax statements. The preferred approach is to start with some diagrammatic representation or other. However, there is a tendency to feel that because a diagrammatic representation is a good way to start it is therefore a good way to finish.

As indicated earlier in this paper, it is important for an application data model to capture all the business rules - and it is only in the most simple of problem areas (as used for teaching purposes in tertiary education) that this is possible.

Regarding standardization for representation forms other than that described earlier in section 2 of this paper as "language based form", there would surely be value in a standard which defined the basic data modelling constructs which can usefully be captured diagrammatically. Whether there is merit in standardizing the precise representation forms as has been done in the national standards for IDEF1X [9] and SSADM [10] is open to question.

The questions regarding standards for process modelling are more subtle. There are those who begin their analysis of an enterprise or even their prescriptive design of a computerized information system using some kind of process modelling technique, typically a variant of data flow diagramming. There are so many variants of this class of technique in use that it would be difficult if not impossible to agree on an international standard.

Those who begin their analysis of an enterprise and/or prescriptive design of an information system using some kind of data modelling technique face a different problem when it comes to standardization. The process modelling technique--for whichever of the three parts of the three part spectrum it is to be used--must relate to and be based on the data modelling technique.

One can argue that, for the part of the three part spectrum concerning the computerized information system, SQL92 provides a suitable data modelling technique. By the same token, the SQL92 data manipulation statements provide an associated process modelling technique. The SQL92 data manipulation statements are not intended to serve as a process modelling technique for an enterprise or for a computer human interface. Consideration of suitable standard for each of these two would be worthwhile.

19 Role of standard application data models in achieving interoperability

There are inevitably different kinds of inter-operability, but consideration here is limited to that associated with Electronic Data Interchange. There is an enormous amount of work in progress to define standard message formats for use in Electronic Data Interchange. So many such formats have already been defined that there is now work in progress in ISO to develop a Basic Semantic Repository to serve as a kind of clearing house to specify how the numerous message types and their content are inter-related.

There appears to be some debate concerning the role of data modelling in the context of EDI. However, it seems apparent that when developing a standard for a message type, it is not sufficient to standardize the format. Unfortunately, it is also necessary to standardize the semantics. It is all to easy for the transmitter of a message to conform to the format of a message and to ignore the issue of the meaning which is to be ascribed to the fields in the message. Mapping from one data modelled using one data modelling technique to data modelled using another data modelling technique is always possible - at a price. If the two techniques used are both based on international standards then one can in theory improve interoperability by developing an associated mapping standard. Inter-operability would be cheaper and more effective if the same data modelling standard were used for both persistent and transient data.

REFERENCES

[1] Olle, T.W., Hagelstein, J., Macdonald, I.G., Rolland, C., Sol, H.G., Van Assche, F.J.M., Verrijn- Stuart, A.A. Information Systems Methodologies - A Framework for Understanding. Second edition. Published by Addison- Wesley. 1991.

[2] Bachman C.W. Data Structure Diagrams Database (journal of ACM SIGBDP) 1, No. 2. (Summer 1969)

[3] ISO 10027:1993. Reference Model of Data Management

[4] ISO TR 9007. Concepts and Facilities for a Conceptual Schema Language.

[5] Olle, T.W. Data Modelling and Conceptual Modelling: A Comparative Analysis of Functionality and Roles. Australian Journal of Information Systems. Issue No 1 September 1993

[6] ISO 9075:1986. Database Language SQL

[7] ISO 9075:1989. Database Language SQL

[8] ISO 9075:1992. Database Language SQL

[9] US FIPS PUB 184 IDEF1X

[10] BS 7738 Structured Systems Analysis and Design Method


Return to: JSW Home Page.
Send message to: Dr. T.William Olle or nell@nist.gov, (Jim Nell) workshop convener and secretary.