Send message to: tony@ontek.com, (Tony Sarris), or nell@nist.gov, (Jim Nell) Workshop secretary.
Return to: JSW Home Page.
Towards Knowledge
Representation:
The State of Data and Processing Modelling
Standards
An Expert Presentation to the ISO Joint Workshop
on Data and Process Modelling, September 9-12, 1996, Bellevue, WA,
U.S.A.
Anthony K. Sarris
Ontek
Corporation
ANSI ASC X3 Technical Committee
T2
tony@ontek.com
Abstract
The
subject of this workshop--primarily approaches, techniques and tools for
describing, integrating and utilizing enterprise models--is complex exactly
because it represents the intersection of several different standards and
technologies for information management. From ways to conceptually model
enterprise semantics in the form of objects, processes, relations and
rules, to languages and fundamental constructs for integrating
heterogeneous models, to CASE and data dictionary/repository technologies
for effective model management and reuse, each challenge emphasizes a
different aspect of the same underlying problem: knowledge representation
(KR). Artificial intelligence is not a panacea for knowledge representation
problems in the sense that some claimed in the 1980s. But advancements in
modelling require a concerted effort to focus resources on flexible,
expressive KR languages and plug-and-play sets of domain conceptual
models--'ontologies' in the current terminology. Adaptable, integrated
toolkits are required to support such efforts.
Introduction
This paper uses several
informal categories as a framework for discussing a number of ISO and
related standards projects addressing aspects of knowledge representation
(KR). KR encompasses, but is not limited to, data and process modelling, or
more generally, enterprise modelling. First, the categories themselves are
identified and described. They are subsequently used to discuss existing
standards projects and to make some observations about the scope, approach
and results of those projects. Some recommendations are made with respect
to existing standards, as well as possible areas for future standards work
related to KR. The author has been actively participating in the ISO/IEC
JTC1/SC21 WG3 Conceptual Schema Modelling Facilities (CSMF) project since
its initial planning stages in 1991, and has been involved to a lesser
extent with a number of the other standards projects that are also
addressing aspects of KR.
Categories for Stocktaking of Current
Standards
The five categories identified and described in this section
are used informally as a framework for discussing a number of ISO and other
standards related to KR. The author is not proposing that these categories
provide a formal way in which to organize KR issues, rather they are used
simply as a convenient way to present the key points of this paper.
Regardless of the fact that the categories are used informally, the reader
needs to understand what each category refers to in order to understand the
description of various standards projects, which are given in the context
of the categories.
Heterogeneity of Modelling
Forms
Heterogeneity of modelling or KR methods includes both the
nature of what can be modelled using the particular method (i.e., the
semantic aspect) and the expressive form of the method (i.e., the
syntactical aspect). In the sense that the former presupposes a certain set
of fundamental constructs--based for example on whether the method is
intended for data modelling or process modelling--heterogeneity among
enterprise modelling methods (actually KR methods in general) is inherently
introduced. The underlying reason for needing certain fundamental
conceptual constructs to describe, for example, enterprise processes is a
question of the ontological stance reflected in the method and is described
in the section on ontology below. Suffice it to say at this point that some
level of basic semantic heterogeneity must be accounted for. There are many
methods that addresses data/information only and many that address
processes only. It is often the case in enterprise modelling that there is
a need to relate these two fundamental kinds of models.
Even within
certain broad semantic categories like data/information modelling,
heterogeneity of modelling forms is rampant. This heterogeneity arises from
the overall modelling paradigm and from the specific modelling language
used. For example, static semantics (i.e., the content of data and
information models) can be modelled using entity-attribute-relationship
(E-A-R) approaches, binary-and-elementary-n-ary-relationship (BENR)
approaches and the object class aspects of object-oriented modelling
approaches. Additionally, there are many 'flavors' of E-A-R, such as Chen's
E-A-R, Ross' Extended E-A-R, Navanthe's Extended E-A-R and IDEF1X. That a
particular modelling method may have both a linear (text) form and a
graphical form of its basic syntax is still another issue; however, this is
discussed as part of the section on Knowledge Acquisition and Presentation.
ISO/IEC TR9007, 1987 discusses heterogeneous modelling methods
in more detail, as do Sarris, 1992 which provides an overview
of some 30 commonly
used modelling methods and ANSI X3/TR-14 Part 2,
1995, which analyses 9 modelling methods. Other studies such as
The Object Agency, Inc., 1995 and X3H7, 1996
compare dozens of methods all within the object-oriented paradigm
alone.
Distribution
When the first large-scale enterprise
modelling projects were performed in the late 1970s and early 1980s [this
is deliberately not including natural language system descriptions or other
less
structured 'models', which have of course been produced for many years
as part of almost every systems analysis and development project of any
size], the models were often in the form of paper--everything from
butcher-paper sheets taped to conference room walls to model print-outs
that resemble a set of encyclopedias. With the arrival of computer-based
modelling tools in the mid 1980s, the models still often resided in one
central model database managed by an enterprise modelling authority. They
were in essence an historical record of the completed modelling project,
not a living, active enterprise model.
Today, as is the case with most
other computer-based applications, enterprise models are created in
distributed client-server environments across company-wide intranets. In
many cases, models of such complex agents as 'virtual corporations' (i.e.,
collaborative teams of organizations that band together and disband to meet
rapidly changing research needs or business opportunities) are created
across the global internet. The demands of distributed, collaborative KR
projects add new elements to the problem mix; some demands, such as
versioning or configuration management and resolution of conflicting
information are problems general to the area of distribution. At a minimum
there are requirements for various protocols to support the physical
interchange of models across distributed networks.
Knowledge
Acquisition and Presentation
A picture may be worth a thousand words,
but even in today's multi-media world natural language sentences expressed
in text documents (or mixed-media documents built around text) still
dominate. A fundamental question is whether a natural language document,
such as a policies and procedures manual is really an enterprise model,
since the knowledge in the 'model' is not explicit (let alone
machine-interpretable). Natural language processing (NLP) has been a driver
for much of the work in artificial intelligence. The objective of formally
acquiring knowledge by parsing and interpreting text is of interest to a
large community of researchers and to industry. However, leaving the
specific (and rather significant) challenges of NLP aside for a moment,
there is a simpler but still quite important debate that has gone on for
many years in the enterprise modelling community: should modelling languages
have a (primarily) linear/text form (e.g., a so
called stylized natural
language) or a graphical form. If a modelling method offers both forms, the
question arises as to whether the two forms are fully equivalent and
whether tools to support the method can perform automatic, complete
conversions from one form to another. If the latter is provided for, then
this issue can be reduced purely to a matter of preference at the time
knowledge is captured and when it is subsequently presented (for purposes
of validation, education, etc.).
There has been a tendency for those who
favor graphical modelling methods to argue that these methods are somehow
more rigorous and that they encourage analysts to scrutinize model contents
more closely during the acquisition process. Those who favor stylized
natural languages tend to point to the fact that those who know the
real-world domain being modelled often feel more comfortable using natural
language-like modelling methods, whether they are directly performing the
modelling themselves or reviewing/validating a model produced by an
analyst. With the aid of various automated modelling tools a higher-degree
of rigor (in the sense of checking model contents for conformance to the
rules of the modelling method, and to a lesser degree conflict-checking to
ensure semantic integrity in the model) can be ensured than was the case
when models were checked largely by humans performing visual or other
inspections. However, this fact still has not ended the linear-graphical
debate.
Ontology
Every modelling or KR paradigm reflects in its
fundamental conceptual constructs certain views about the things in the
world (the real or imagined world), as well as the concepts we use to
represent and describe those things. One aspect of this ontological stance
manifests itself in the semantics and syntax of the modelling methods
themselves (for example, we can conceive of the world being populated with,
among other things, physical objects that have certain physical properties,
which we can represent in an E-A-R model as entities with attributes and in
a database management system as tables with columns). Another aspect
manifests itself in the kinds of things we represent as contents in typical
enterprise models (for example, organizations, resources, objectives,
plans, services, artifacts, spatial locations, etc.). One or more
fundamental ontologies are necessary to perform any kind of conceptual
modelling or knowledge representation. Commonsense or mid-world ontologies
are useful for describing and reasoning about things found in the everyday
world. Common linguistic ontologies are also useful for describing the
everyday world, for providing starter sets of terms to use when building
more specialized domain ontologies, and for attempting to interpret natural
language-based descriptions or models of the
world.
Reasoning
Our knowledge of some domain, as we represent
it in some model, is never complete nor completely accurate. To be useful,
it need only be complete and accurate enough for its intended purpose. But
if conceptual models are to serve an on-going purpose in the enterprise
(rather than to be used solely as part of a system development
process--then simply discarded after that), they must be able to be
improved and expanded over time, as well as revised to reflect changes in
the domain they model. Reasoning capabilities have a role in both the
knowledge acquisition (read, model building) process and in the on-going
maintenance of the model. As new contents are entered in a model, those
contents should be analyzed in light of the existing contents. Conflicts or
potential conflicts should be identified. It is presumed that for the
foreseeable future, humans will be required to actually resolve any model
conflicts. On an on-going basis, reasoning capabilities can also be used to
predict new facts about the enterprise based on existing model contents, to
analyze instance-level data and compare it to meta (schema)-level types and
rules to test the validity of the model,and to identify areas where the
model is possibly incomplete or inaccurate, or where the domain being
modeled has changed since the model was built.
Finally, an important
objective for producing enterprise conceptual models is to serve as the
basis for future intelligent applications in the fields of data mining,
decision-support and management automation. While some capabilities, such
as data mining, may rely on neutral network approaches or nonmonotonic
reasoning capabilities, even in such cases the existence of a starting
candidate model that can serve to bound or guide refinement efforts could
prove helpful. Additionally, some future application systems may use
semantic models as direct sources for their information and processing
rules--in the extreme case the conceptual model itself could be executed
directly as an application system.
Observations and
Recommendations
Based on a brief stocktaking of current standards
efforts using the categories described above, some general observations and
recommendations can be made. The observations and recommendations are
intended as input to the discussions at the workshop, particularly to
identify areas where standards projects relate, or should relate, to one
another. Concern has been expressed in some communities about overlapping,
redundant and even conflicting standards projects. The author believes that
while there is some direct overlap, more often than not the projects can
placed relative to one another within an overall framework of enterprise
modelling/KR standards. They can be shown to be heading at different speeds
and with specialized thrusts toward an overall goal that reflects a
combination of the key categories described above. The overall goal is
consistent with the notion of a three-schema architecture for information
management, as envisioned by ANSI/X3/SPARC in Tsichritzis and Klug,
1978 and the notion of a conceptual schema as described in
ISO/IEC TR9007, 1987 and ANSI/X3 TR-14 Part 1,
1995. Given the scope and complexity of enterprise
modelling, it is to be expected that many methodology and technology
standards related to KR (including acquisition, conceptual representation,
integration, storage, distribution and presentation) would need to be
applied in various combinations and to differing degrees to provide
meaningful solutions to requirements in this area.
Syntactical
Interchange of Heterogeneous Modelling Forms
Modelling forms that
excel in their specialized focus must also be able to be integrated with
other [heterogeneous] modelling forms to produce a unified model, or at
least to interchange semantics that have meaning in more than one modelling
context. This requires syntactical interchange as a first step. The
syntactical level of interchange, as well as some static semantics has been
largely addressed within particular methodologies (by standards such the
IEEE IDEF Interchange Definition Language), as well as among major groups
of methodologies (by standards such as ISO/IEC JTC1/SC7 WG11 CASE Data
Interchange Format or CDIF, ISO/IEC CD? xxxxx, 1996. The CDIF
meta-meta-model uses an E-A-R/database schema-like approach to represent
the syntactical constructs and some model semantics for several commonly
used enterprise modelling languages. Data/information modelling languages
have been the primary focus, although some work has been performed in the
area of process/dynamic modelling (for example, there is a meta-model for
Data Flow Modelling or DFM). The meta-models are voluminous (e.g., 263
pages for E-A-R and 151 for DFM). They are specified using both natural
language text and the semi-formal E-A-R/database schema specifications. The
axioms or rules are given only in natural language, however, as there is
currently no constraint or rule-specification language utilized for CDIF
standards. Given the sheer size of the specifications, it is difficult to
pick out the critical material. With limited formal specification, and
again in light of the size of the meta-models, checking for closure and
completeness is also a difficult task. The database schema specifications,
however, do offer an API-like mechanism for exchanging models between CASE
tools. A large amount of work has gone into the creation of the meta-models
and they offer a good starting basis for future work.
The initial level
of model semantic interchange is based on a mapping of common modelling
constructs using the meta-meta-model as the interchange mechanism or
bridge. Here again, the CDIF approach provides a database schema or
repository perspective on enterprise model exchange, rather than a
complete conceptual schema perspective on enterprise model
integration. The CDIF meta-meta-model does not support full, formal
description of modelling method semantics and does not support general
expressions of complex mapping rules (using, for example, a logic-based
constraint language). This is partly attributable to the meta-meta-model
constructs themselves, but perhaps more so to the E-A-R expressive
formalism used.
Extensions to the meta-meta-model would be needed for
representing more complex modelling method semantics, particularly in the
areas of process/dynamic semantics and constraints or rules, and for
representing and integrating the semantics of domain ontologies. These
areas are both within the scope of the ISO/IEC JTC1/SC21 WG3 Conceptual
Schema Modelling Facilities (CSMF) project. Liaison activities are on-going
between SC7 WG11 CDIF and SC21 WG3 CSMF. It is recommended that this
liaison activity be continued and if possible, taken to a deeper technical
level. CSMF has explored the use of symbolic logic-based languages as more
powerful, general expressive formalisms. This is especially useful for
explicitly representing rules or constraints. Although the draft CSMF
standard, ISO/IEC WD 14481, 1996, does not specify any
particular concrete syntax, symbolic logic was a driving requirement for
the CSMF standard. It is expected that several of the concrete conceptual
schema languages claiming conformance to the CSMF standard (for example,
ANSI standard Conceptual Graphs or CGs, ANSI X3 1059-D,
1995, and ANSI standard Knowledge Interchange Format or
KIF, ANSI X3 1058-D, 1995) will be based on symbolic logic.
Therefore, it is expected that conforming CSLs will be useful for
expressing the semantics necessary for formalizing and extending the CDIF
meta-models.
Distributed Systems Management Across Intra- and
Internets
Distribution of enterprise models is necessary across intra-
and internets. The location and source of the models is no longer
centralized and can no longer be predetermined. Methodological and
technological standards to support basic physical distribution, in terms of
protocols for modelling tool-to-modelling tool messaging and APIs between
model-based applications running on networks, is being addressed largely
through consortia and vendor-supplied standards such the Object Management
Group (OMG)'s Common Object Request Broker Architecture Interchange
Definition Language (COBRA IDL), OMG, 1995(a), Microsoft
Corporation's Object Linking and Embedding (OLE) and other related
object-based networking approaches. Export/import of models stored in
relational database management systems (DBMSs) is addressed in ISO/IEC JTC1
SC21 WG3 Reference Model for Data Management (RMDM), ISO/IEC IS
10032, 1995, as well as various projects under the WG3 Export/Import
Rapporteur Group.
Wrapper protocols or services specific to enterprise
modelling language interchange are not currently the subject of special
standards, and it remains to be determined if there are any special needs
in this area not being met by the general standards noted above. There has
been some government-funded R&D work in the U.S. (e.g., the Knowledge Query
and Manipulation Language or KQML), but this work is not currently proposed
as part of any standard being developed by ISO or any similar accredited
standard bodies. Distribution management is the subject of much research
and development, particularly as regards maintaining consistency and
providing roll-back/recovery across sites sharing distributed databases.
The two major ISO data dictionary/repository standards projects, which are
described in the section on registration and reuse of domain models below,
also address aspects of this issue by specifying configuration management
and versioning capabilities, as does the Common Facilities Architecture,
OMG, 1995(b), work being conducted by the OMG consortium.
Again, no particular standards in this area are addressing distribution
management issues for enterprise modelling languages. To the degree that
modelling in specific and knowledge representation in general are closely
related to distributed database management and data
dictionaries/repositories (in terms of the information technologies used to
implement automated modelling capabilities), it is again quite possible
that no additional or specialized capabilities are needed beyond those
addressed under existing standards projects.
Natural Language
Processing
Natural language allows domain experts (i.e., 'users') to
describe an enterprise in a way that appeals to them as non-IT
professionals, as well as enabling models represented in specialized
technical formats to be presented in formats more oriented to their
application in a business context. On the whole, however, this field has
suffered from the problem of trying to take on too much at once. Many of
the efforts have concentrated on natural language interpretation, which is
an extremely difficult, multi-disciplinary area rife with problems that are
far from being solved. It may well never be automated to as large a degree
as many would like. In any case, it is premature for standardization in
that particular area.
In general, natural language processing is not the
subject of standardization, but remains largely an area of research.
Notable exceptions involve modelling languages which have a natural
language
like or stylized natural language syntax, or the syntax of which
can be mapped to the grammatical elements of natural languages. Conceptual
Graphs is one such language being standardized. Currently it is the subject
only of an ANSI standard, ANSI X3 1059-D, 1995, but
internationalization is currently being explored. CGs is a symbolic
logic-based language that expresses enterprise model contents as conceptual
structures. CGs can be created from natural language expressions (this is a
human-intensive process, however) and the conceptual structures in a CG
model can be mapped to natural language elements--meaning CG models can be
presented as stylized natural language expressions. NIAM is another
modelling language that models facts about the enterprise as stylized
natural language sentences, or as graphical models that can be expressed as
sentences. While there is a large body of documentation about NIAM from a
practitioner's perspective, it is not currently the subject of a
standardization by ISO or a similar accredited standards body, nor by any
known private consortium.
There is also considerable work in academia
and some in the software vendor industry in the field of linguistic
ontologies. The contents of these ontologies are the terms most commonly
used in natural languages. Linguistic ontologies are useful as starter sets
or building blocks when producing specialized domain ontologies. They may
also be useful for natural language parsing and interpretation. Examples of
linguistic ontologies include the Pangloss Upper Model from Dr. Ed Hovy at
the University of Southern California Information Sciences Institute (USC
ISI), WordNet from Dr. George Miller at Princeton University, portions of
the CYC ontology from Dr. Doug Lenat et al at Cycorp, and the EDR
Electronic Dictionary from the Electronic Dictionary Research Institute,
Ltd. in Japan. There are currently no national or international standards
projects in this area. ANSI ASC X3, Technical Committee T2, is currently
formulating a project proposal and performing preliminary standardization
work for a U. S. domestic project in this area (to be conducted under ANSI
ASC X3).
Registration and Reuse of Domain Models (Concept Libraries
or Domain 'Ontologies')
The major objective in this area is provide
registries of formal and informal ontologies available for access in the
public or private domain and for use in various levels of semantic model
building and integration (i.e., model mapping, alignment and unification).
This presupposes methods and tools for knowledge representation, including
model development and model management. It is anticipated that such methods
and tools may be applied on a stand-alone basis, as well as within the
implementation context of CASE tools and data dictionaries/repositories. It
also presupposes standards for the specification of contents in domain
ontologies.
CASE Tools and Data Dictionaries/Repositories
The
two major international standards efforts in the data dictionary/repository
field are ISO/IEC JTC1/SC21 WG3 Information Resource Dictionary System
(IRDS), ISO/IEC IS 10027, 1992 and ISO/IEC IS 10728,
1994 and ISO/IEC JTC1/SC22 WG22 Portable Common Tool Environment
(PCTE), ISO/IEC DIS 13719, 1994. The former reflects the
viewpoint of the structured data/DBMS community--at this point this is
basically synonymous with the relational DBMS community. There are few, if
any, commercial products conforming to this standard. The latter reflects
the viewpoint of the software development/programming languages community,
with somewhat of a general object-oriented flavor (in fact, the PCTE
efforts are being coordinated to at least some degree with OMG's work in
object-oriented technologies, including repositories). As of December 1994
there were approximately half a dozen commercial products conforming to
this standard, although many of these products were still at least
partially under development. It is not clear, however, that there is major
vendor or industry support for this standard in its current form, so much
as support for its stated future direction and its close working
relationship with the OMG consortium. In any case, the two major
international data dictionary/repository projects need to be closely
coordinated to ensure that the structured data/DBMS community and the
software community have compatible standards. Both need to ensure that the
needs of the scientific and technical data community are also taken into
account in international data dictionary/repository standards.
A major
concern in this area is that data dictionary/repository standards should
not be dictating the formats of their content models at the level of
specific implementation approaches; rather they should specify acquisition,
naming, storage, retrieval, versioning, configuration management and other
information resources management services in a manner neutral (i.e.,
independent of) any particular implementation approach. The fact that the
current IRDS standard is so closely coupled to a single DBMS approach
(i.e., relational) is an aspect that limits its wider-spread acceptance in
some potential user communities. The PCTE standard is less coupled to
particular implementation technologies. Although it does have an
object-oriented flavor, object-oriented in this sense is not concerned with
the mechanics of the object-oriented programming and database paradigms,
but rather with the basic notion of treating contents stored in a
repository as objects. This approach lends itself to storage of
non-traditional contents such as that associated with multi
media. The
possibility of the IRDS and PCTE projects being more closely coordinated or
even joining forces would be greatly enhanced if both were to be as neutral
as possible about target implementation technologies.
The ISO/IEC
JTC1/SC7 WG11 CDIF project, ISO/IEC CD??? xxxxx, 1996, is the
major standards project associated with CASE tools, although there are
several smaller industry efforts focusing on specialized aspects of CASE
tools. Since CASE tools would be expected to utilize data
dictionary/repository services or perhaps even to be combined with data
dictionary/repository technologies as part of a complete systems
engineering environment (SEE), the CDIF project and the IRDS and PCTE
projects should work closely together.
General Knowledge
Representation and Ontology
The only major standards project
addressing the issue of knowledge representation and ontology (in general)
is the ISO/IEC JTC1/SC21 WG3 Conceptual Schema Modelling Facilities (CSMF)
project, ISO/IEC WD 14481, 1996. It does not simply deal with
model form/syntax or basic model semantics, but rather with the common
underlying constructs used to represent all entities or objects, processes
and constraints or rules. The current effort admittedly only addresses an
initial starter set of these constructs. As important perhaps as the
specific normative constructs specified in this initial CSMF standard is
the overall approach reflected in the standard. The standard itself may be
able to serve as a framework for future KR and ontology standards.
The
CSMF work is based on the ANSI/X3/SPARC three-schema architecture,
Tsichritzis and Klug, 1978. De Witt, 1992 notes
that the basic ideas of the three-schema architecture--namely the
segregation of the meaning of data in a database from the form in
which it is implemented and from the form in which it is
presented to users through applications--have been around for a
number of years. The IBM GUIDE users group published ideas of this nature
in GUIDE, 1971. This was the precursor that led to the
standard architecture published by ANSI/X3/SPARC. The ideas are currently
being 'rediscovered' by a number of communities. At the heart of the
architecture is the conceptual schema, which is an enterprise model
expressed in some formal knowledge representation scheme. Mappings are
provided to the presentation forms of the external schema and to the
implementation forms of the internal schema. A conceptual schema (CS) must
adhere to the fundamental principles described in ISO/IEC TR9007,
1987, as incorporated and expanded in ISO/IEC WD 14481,
1996. These principles include:
- 100% Principle - The CSMF
enables the production of CSs that obey the "100% Principle" of ISO TR9007,
i.e., all relevant static and dynamic rules, laws, etc. about the universe
of discourse should be described in a CS.
- Conceptualization
Principle - The CSMF enables the production of CSs that obey the
"Conceptualization Principle" of ISO TR9007, i.e., a CS should only include
conceptually relevant aspects, both static and dynamic, of the universe of
discourse. All aspects of external or internal data representation are to
be excluded. In particular this enables the production of a CS which is
independent with respect to physical implementation technologies and
platforms.
- Helsinki Principle - The CSMF enables the production of
CS's that obey the "Helsinki Principle" i.e., any meaningful exchange of
utterances depends upon the prior existence of an agreed upon set of
semantic and syntactic rules. The recipients of the utterances must use
only these rules to interpret the received utterances, if it is to mean the
same as that which was meant by the utterer.
- Ontology - The CSMF
allows distinction between the concept and the representation of the
concept.
- Nature of the World - The CSMF makes minimal assumptions
concerning the nature of the world. Only very fundamental ontological and
representational constructs are found in the basic CSMF starter set of
normative constructs. This allows the majority of ontological content to be
introduced into the CS through specific domain models. The marketplace can
choose among these domain models, 'plugging and playing' the most
appropriate ones given particular purposes or uses.
- Concrete
Syntaxes - The CSMF contains constructs to enable easy mapping to and from
concrete syntaxes. The CSMF standard itself does not specify a conceptual
schema language (either abstract or concrete), but some requirements that
concrete conceptual schema languages will have to meet are specified. The
CSMF standard does specify the basic, minimal set of normative semantic
constructs that concrete CSLs will be expected to be able to
represent.
- Ease of Understanding - The CSMF enables the presentation
of CS contents in a way that can easily be understood by humans
knowledgeable about the UoD. This is again at the level of the kinds of
things to be found in a CS, not the particular concrete syntaxes used to
present CSs. Which concrete syntaxes lend themselves best to domain expert
(i.e., 'user') presentation is left to the marketplace to
decide.
- Extensibility - The CSMF provides mechanisms for extending
the standard set of constructs. Conforming concrete conceptual schema
languages (CSLs) need not limit their constructs to just the starter set of
normative constructs specified in the CSMF standard. While various levels
of conformance to the normative constructs may be permitted, concrete CSLs
will not be prohibited from offering additional constructs beyond the
normative CSMF constructs. However, any additional constructs must not be
in conflict with the CSMF normative constructs, and all additional
constructs must be derivable from either the normative constructs
themselves, or from the normative constructs plus any previously-defined
additional constructs.
- Self Description - The CSMF complies with the
"Meta Principle", i.e., the constructs defined in the CSMF standard are
capable of self-description (with the exception of any constructs declared
to be primitive). The normative constructs are also internally-closed and
consistent.
Finally, it should be noted that there at two primary
objectives for the CSMF standard:
- To enhance interoperability
between enterprise models (i.e., conceptual schemas), and
- To improve
the quality (e.g., accuracy, completeness, semantic depth, etc.) of
enterprise models, and correspondingly of information systems developed
based on the models.
Both of these objectives are important.
Fortunately, the requirements of both objectives, in terms of CSMF
capabilities, as well as the approaches to providing those capabilities,
overlap to a large degree. By providing ways to both integrate existing
enterprise modelling methods and to enrich or extend conceptual modelling
capabilities through improvements in knowledge representation, the CSMF
project is working towards the achievement of both objectives
concurrently.
Domain Model Contents
The major tasks in
this area are to: decide what contents should be standardized; organize the
contents that are deemed suitable for standardization; standardize the
manner (but not the form or mechanics) by which such contents should be
specified; and develop the content standards, i.e., produce the actual
domain models. Decisions about which contents to standardize are made by
international trade standardization groups such as the United Nations
EDIFACT. The ISO/IEC JTC1/SC30 Open Electronic Data Interchange (EDI)
project deals with the overall standardization of those key international
trade and commerce data elements chosen for standardization. Its reference
model, ISO/IEC CD 14662 Part 2, 1996, explains its scope and
purpose. The manner of standardization is specified by ISO/IEC JTC1/SC14
Data Element Standardization, which has this task as its SC charter. A
major international standard, ISO/IEC IS 11179, 1995, has been
produced by SC14 addressing this subject. The UN/ISO Basic Semantic
Repository (BSR) project also addresses the manner in which domain model
contents in the field of EDI--for example, trade data element names--should
be specified. It is described by some as a partial implementation of IS
11179, also providing specific contents relevant to the field of EDI.
However, there is concern that this project crosses over into the form and
mechanics of specification, going beyond the scope of IS 11179 and into the
field of semantic modelling. The form and mechanics of semantic
specification should be left to other standards projects concerned
specifically with knowledge representation--the ISO/IEC JTC1/SC21 WG3 CSMF
project and related standardization projects for concrete enterprise
modelling languages under ISO/IEC JTC1/SC7 WG11. Industry user groups
should not be specifying what are essentially information technology
standards--standards that deal with complex knowledge representation and
information management issues.
ISO TC184/SC4 STandard for the Exchange
of Product model data STEP, ISO IS 10303, various, is the
largest standards project targeted at producing a domain model or ontology
for a major application area--in this case engineering and manufacturing
product definition. As a forerunner in the enterprise modelling and
KR/ontology fields, the STEP community has had to develop some of its own
approaches and techniques in cases where there were none available to
utilize at the time. The EXPRESS modelling language, ISO IS 10303
Part 11, 1993, is an example. However, now that a number of
information technology standards projects are working on various aspects of
model integration and conceptual modelling, the STEP community should
support those generic efforts to ensure that STEP-specific needs and
requirements can be adequately addressed directly by the more generic
standards, or through specializations of the generic standards. This, in
fact, appears to be the current direction of STEP, as a close liaison has
been established with ISO/IEC JTC1/SC7 WG11 (e.g., the SEDDI project), and
liaison acitvities have also been undertaken in conjunction with ISO/IEC
JTC1/SC21 WG3 CSMF RG.
Inferencing and Intelligent
Decision-Support
While inferencing techniques may be commonly found in
specialized technical circles, practical application of the techniques in
the form of commercial-off-the-shelf tools (e.g., inference engines) is
lacking, except to support narrowly-focused experts systems software. What
minimal commercial technology is available in the intelligent
decision-support field uses straight-forward vector-based pattern-matching.
It makes little or no use of inferencing rules specified at a
meta
schematic level. Data mining technology uses basically the same
pattern-matching approaches, as well as fuzzy-logic approaches and neural
networks.
Advances in the field of inferencing and intelligent
decision-support depend largely upon semantic advances in modelling
languages (particularly for rule specification) and in domain ontologies
(for the meta-level contents against which to perform inferencing and
against which instance-level patterns identified in the information base
can be compared). Advances also depend upon implementation environments in
the form of CASE tools and data dictionaries/repositories. Current
standards in the field of CASE tools and data dictionaries/repositories
specify nothing in the area of inferencing and provide little or no support
for incorporating such capabilities into IT tools in those
fields.
References:
ANSI X3/TR-14 Part 1, 1995. IRDS Conceptual
Schema, Part 1 Conceptual Schema for IRDS. American National Standards
Institute (ANSI) Accredited Standards Committee X3 Information Processing
Systems.
ANSI X3/TR-14 Part 2, 1995. IRDS Conceptual Schema, Part 2,
Modeling Language Analysis. American National Standards Institute (ANSI)
Accredited Standards Committee X3 Information Processing Systems.
ANSI
X3 1058-D, 1995. SD-3 (Approved) Proposal to Develop a New X3 Standard on
Knowledge Interchange Format (KIF). American National Standards Institute
(ANSI) Accredited Standards Committee X3 Information Processing
Systems.
ANSI X3 1059-D, 1995. SD-3 (Approved) Proposal to Develop a New
X3 Standard on Conceptual Graphs (CGs). American National Standards
Institute (ANSI) Accredited Standards Committee X3 Information Processing
Systems.
De Witt, S., 1992. "Three-Schema Enterprise Modeling", in Pro
ceedings of the IDEF Users' Group Conference.
GUIDE, 1971. GUIDE
Secretary Distribution (GSD) 23: "Requirements for a Data Base Management
System". GUIDE International Corporation.
ISO IS 10303, various.
Standard for the Exchange of Product model data (STEP) [a multi-part
standard]. International Organization for Standardization, Technical
Committee 184 Industrial Automation Systems, Subcommittee 4 Industrial Data
and Global Manufacturing Languages (ISO TC184/SC4).
ISO IS 10303 Part
11, 1993. Industrial Automation Systems - Product Data Representation and
Exchange, Part 11: Description Methods: The EXPRESS Language Reference
Manual. International Organization for Standardization, Technical Committee
184 Industrial Automation Systems, Subcommittee 4 Industrial Data and
Global Manufacturing Languages (ISO TC184/SC4).
ISO/IEC CD 14662 Part 2,
1996. Information Technology - Open-edi Reference Model, Part 2.
International Organization for Standardization (ISO), Standing Committee 30
Open-edi.
ISO/IEC DIS 13719, 1994. Information Technology - Portable
Common Tool Environment (PCTE), Parts 1-3. International Organization for
Standardization (ISO), Standing Committee 22 Programming Languages, Working
Group 22 Portable Common Tool Environment (PCTE).
ISO/IEC DIS xxxxx,
1996. Information Technology - CASE Data Interchange Format (CDIF).
International Organization for Standardization (ISO), Standing Committee 7
Software Engineering, Working Group 11 Data Definition and
Representation.
ISO/IEC IS 10027, 1992. Information Technology -
Information Resource Dictionary System (IRDS) - Framework. International
Organization for Standardization (ISO), Standing Committee 21 Open Systems
Interconnection, Data Management and Open Distributed Processing, Working
Group 3 Database, Information Resource Dictionary System Rapporteur Group
(IRDS RG).
ISO/IEC IS 10032, 1995. Information Technology - Reference
Model of Data Management. International Organization for Standardization
(ISO), Standing Committee 21 Open Systems Interconnection, Data Management
and Open Distributed Processing, Working Group 3 Database, Reference Model
of Data Management Rapporteur Group (RMDM RG).
ISO/IEC IS 10728, 1994.
Information Technology - Information Resource Dictionary System (IRDS) -
Services Interface. International Organization for Standardization (ISO),
Standing Committee 21 Open Systems Interconnection, Data Management and
Open Distributed Processing, Working Group 3 Database, Information Resource
Dictionary System Rapporteur Group (IRDS RG).
ISO/IEC IS 11179, 1995.
Information Technology - Specification and Standardization of Data
Elements, Part 1-5. International Organization for Standardization (ISO),
Standing Committee 14 Data Element Standardization.
ISO/IEC TR9007,
1987. Information Processing Systems - Concepts and Terminology for the
Conceptual Schema and the Information Base. International Organization for
Standardization (ISO).
ISO/IEC WD 14481, 1996. Information Technology -
Conceptual Schema Modelling Facilities (CSMF). International Organization
for Standardization (ISO), Standing Committee 21 Open Systems
Interconnection, Data Management and Open Distributed Processing, Working
Group 3 Database, Conceptual Schema Modelling Facilities Rapporteur Group
(CSMF RG)
OMG, 1995 (a). CORBA 2.0/Interoperability: Universal Networked
Objects. Object Management Group, Inc.
OMG, 1995 (b). Common Facilities
Architecture. Object Management Group, Inc.
Sarris, 1992. Integration
Toolkit and Methods (ITKM) Corporate Data Integration Tools (CDIT), Review
of the State-of-the-Art with Respect to Integration Toolkits and Methods
(ITKM). WL-TR-92-8048/DTIC AD-A255 547. Air Force Manufacturing
Directorate, Wright
Laboratory, Wright-Patterson Air Force Base.
The
Object Agency, 1995. A Comparison of Object-Oriented Methodologies. The
Object Agency, Inc.
Tsichritzis, D. and Klug, A. (Eds.), 1978. "The
ANSI/X3/SPARC DBMS Framework", Information Systems, Vol. 3,
173-191.
Wakeman, L. and Jowett, J., 1993. PCTE: The Standard for Open
Repositories. Prentice Hall.
X3H7, 1995. (Draft) Technical Report,
Object Model Features Matrix. American National Standards Institute (ANSI)
Accredited Standards Committee (ASC) X3 Information Processing Systems,
Technical Committee H7 Object Information Management.
About the
Author
Anthony K. Sarris is vice president of enterprise analysis for
Ontek Corporation, a software research, development and consulting firm
located in Laguna Hills, California, U.S.A. In his role at Ontek, Mr.
Sarris acts as a project manager and technical specialist, analyzing
information requirements, defining information architectures, and
developing enterprise models for commercial and government
organizations--primarily in the fields of engineering, manufacturing and
information technology. Mr. Sarris has been involved with national and
international standards development projects in the areas of modelling and
data dictionary/repository since 1991. He co-edited ANSI X3/TR-14:1995, the
Information Resource Dictionary System (IRDS) Conceptual Schema Technical
Report. He is currently serving as chairman of ANSI ASC X3 Technical
Committee T2, Information Interchange and Interpretation, and participates
as a U.S. expert in the ISO/IEC JTC1/SC21 WG3 Conceptual Schema Modelling
Facilities (CSMF) project.
Mr. Sarris is a Phi Beta Kappa graduate of
Purdue University and also studied at the University of Hamburg, Germany.
Prior to joining Ontek Corporation, he held technical and management
positions at a large aerospace and defense contractor and a Big Six
technology-consulting firm. He is a frequent speaker at professional
conferences including: IEEE Meta Data, Technology of Object-Oriented
Languages and Systems (TOOLS), Continuous/Computer-Aided Acquisition and
Logistical Support (CALS), Manufacturing Technology Advisory Group
(MTAG)/Agile Manufacturing, the IDEF Users Group, and the National Council
on Systems Engineering/American Society of Engineering Management
(NCOSE/ASEM).
Send message to: tony@ontek.com, (Tony Sarris), or nell@nist.gov, (Jim Nell) Workshop secretary.
Return to: JSW Home Page.