Send message to: tony@ontek.com, (Tony Sarris), or nell@nist.gov, (Jim Nell) Workshop secretary. Return to: JSW Home Page.
Towards Knowledge Representation:
The State of Data and Processing Modelling Standards


An Expert Presentation to the ISO Joint Workshop on Data and Process Modelling, September 9-12, 1996, Bellevue, WA, U.S.A.

Anthony K. Sarris
Ontek Corporation
ANSI ASC X3 Technical Committee T2
tony@ontek.com


Abstract

The subject of this workshop--primarily approaches, techniques and tools for describing, integrating and utilizing enterprise models--is complex exactly because it represents the intersection of several different standards and technologies for information management. From ways to conceptually model enterprise semantics in the form of objects, processes, relations and rules, to languages and fundamental constructs for integrating heterogeneous models, to CASE and data dictionary/repository technologies for effective model management and reuse, each challenge emphasizes a different aspect of the same underlying problem: knowledge representation (KR). Artificial intelligence is not a panacea for knowledge representation problems in the sense that some claimed in the 1980s. But advancements in modelling require a concerted effort to focus resources on flexible, expressive KR languages and plug-and-play sets of domain conceptual models--'ontologies' in the current terminology. Adaptable, integrated toolkits are required to support such efforts.



Introduction

This paper uses several informal categories as a framework for discussing a number of ISO and related standards projects addressing aspects of knowledge representation (KR). KR encompasses, but is not limited to, data and process modelling, or more generally, enterprise modelling. First, the categories themselves are identified and described. They are subsequently used to discuss existing standards projects and to make some observations about the scope, approach and results of those projects. Some recommendations are made with respect to existing standards, as well as possible areas for future standards work related to KR. The author has been actively participating in the ISO/IEC JTC1/SC21 WG3 Conceptual Schema Modelling Facilities (CSMF) project since its initial planning stages in 1991, and has been involved to a lesser extent with a number of the other standards projects that are also addressing aspects of KR.

Categories for Stocktaking of Current Standards

The five categories identified and described in this section are used informally as a framework for discussing a number of ISO and other standards related to KR. The author is not proposing that these categories provide a formal way in which to organize KR issues, rather they are used simply as a convenient way to present the key points of this paper. Regardless of the fact that the categories are used informally, the reader needs to understand what each category refers to in order to understand the description of various standards projects, which are given in the context of the categories.

Heterogeneity of Modelling Forms

Heterogeneity of modelling or KR methods includes both the nature of what can be modelled using the particular method (i.e., the semantic aspect) and the expressive form of the method (i.e., the syntactical aspect). In the sense that the former presupposes a certain set of fundamental constructs--based for example on whether the method is intended for data modelling or process modelling--heterogeneity among enterprise modelling methods (actually KR methods in general) is inherently introduced. The underlying reason for needing certain fundamental conceptual constructs to describe, for example, enterprise processes is a question of the ontological stance reflected in the method and is described in the section on ontology below. Suffice it to say at this point that some level of basic semantic heterogeneity must be accounted for. There are many methods that addresses data/information only and many that address processes only. It is often the case in enterprise modelling that there is a need to relate these two fundamental kinds of models.

Even within certain broad semantic categories like data/information modelling, heterogeneity of modelling forms is rampant. This heterogeneity arises from the overall modelling paradigm and from the specific modelling language used. For example, static semantics (i.e., the content of data and information models) can be modelled using entity-attribute-relationship (E-A-R) approaches, binary-and-elementary-n-ary-relationship (BENR) approaches and the object class aspects of object-oriented modelling approaches. Additionally, there are many 'flavors' of E-A-R, such as Chen's E-A-R, Ross' Extended E-A-R, Navanthe's Extended E-A-R and IDEF1X. That a particular modelling method may have both a linear (text) form and a graphical form of its basic syntax is still another issue; however, this is discussed as part of the section on Knowledge Acquisition and Presentation. ISO/IEC TR9007, 1987 discusses heterogeneous modelling methods in more detail, as do Sarris, 1992 which provides an overview of some 30 commonly used modelling methods and ANSI X3/TR-14 Part 2, 1995, which analyses 9 modelling methods. Other studies such as The Object Agency, Inc., 1995 and X3H7, 1996 compare dozens of methods all within the object-oriented paradigm alone.

Distribution

When the first large-scale enterprise modelling projects were performed in the late 1970s and early 1980s [this is deliberately not including natural language system descriptions or other less structured 'models', which have of course been produced for many years as part of almost every systems analysis and development project of any size], the models were often in the form of paper--everything from butcher-paper sheets taped to conference room walls to model print-outs that resemble a set of encyclopedias. With the arrival of computer-based modelling tools in the mid 1980s, the models still often resided in one central model database managed by an enterprise modelling authority. They were in essence an historical record of the completed modelling project, not a living, active enterprise model.

Today, as is the case with most other computer-based applications, enterprise models are created in distributed client-server environments across company-wide intranets. In many cases, models of such complex agents as 'virtual corporations' (i.e., collaborative teams of organizations that band together and disband to meet rapidly changing research needs or business opportunities) are created across the global internet. The demands of distributed, collaborative KR projects add new elements to the problem mix; some demands, such as versioning or configuration management and resolution of conflicting information are problems general to the area of distribution. At a minimum there are requirements for various protocols to support the physical interchange of models across distributed networks.

Knowledge Acquisition and Presentation

A picture may be worth a thousand words, but even in today's multi-media world natural language sentences expressed in text documents (or mixed-media documents built around text) still dominate. A fundamental question is whether a natural language document, such as a policies and procedures manual is really an enterprise model, since the knowledge in the 'model' is not explicit (let alone machine-interpretable). Natural language processing (NLP) has been a driver for much of the work in artificial intelligence. The objective of formally acquiring knowledge by parsing and interpreting text is of interest to a large community of researchers and to industry. However, leaving the specific (and rather significant) challenges of NLP aside for a moment, there is a simpler but still quite important debate that has gone on for many years in the enterprise modelling community: should modelling languages have a (primarily) linear/text form (e.g., a so called stylized natural language) or a graphical form. If a modelling method offers both forms, the question arises as to whether the two forms are fully equivalent and whether tools to support the method can perform automatic, complete conversions from one form to another. If the latter is provided for, then this issue can be reduced purely to a matter of preference at the time knowledge is captured and when it is subsequently presented (for purposes of validation, education, etc.).

There has been a tendency for those who favor graphical modelling methods to argue that these methods are somehow more rigorous and that they encourage analysts to scrutinize model contents more closely during the acquisition process. Those who favor stylized natural languages tend to point to the fact that those who know the real-world domain being modelled often feel more comfortable using natural language-like modelling methods, whether they are directly performing the modelling themselves or reviewing/validating a model produced by an analyst. With the aid of various automated modelling tools a higher-degree of rigor (in the sense of checking model contents for conformance to the rules of the modelling method, and to a lesser degree conflict-checking to ensure semantic integrity in the model) can be ensured than was the case when models were checked largely by humans performing visual or other inspections. However, this fact still has not ended the linear-graphical debate.

Ontology

Every modelling or KR paradigm reflects in its fundamental conceptual constructs certain views about the things in the world (the real or imagined world), as well as the concepts we use to represent and describe those things. One aspect of this ontological stance manifests itself in the semantics and syntax of the modelling methods themselves (for example, we can conceive of the world being populated with, among other things, physical objects that have certain physical properties, which we can represent in an E-A-R model as entities with attributes and in a database management system as tables with columns). Another aspect manifests itself in the kinds of things we represent as contents in typical enterprise models (for example, organizations, resources, objectives, plans, services, artifacts, spatial locations, etc.). One or more fundamental ontologies are necessary to perform any kind of conceptual modelling or knowledge representation. Commonsense or mid-world ontologies are useful for describing and reasoning about things found in the everyday world. Common linguistic ontologies are also useful for describing the everyday world, for providing starter sets of terms to use when building more specialized domain ontologies, and for attempting to interpret natural language-based descriptions or models of the world.

Reasoning

Our knowledge of some domain, as we represent it in some model, is never complete nor completely accurate. To be useful, it need only be complete and accurate enough for its intended purpose. But if conceptual models are to serve an on-going purpose in the enterprise (rather than to be used solely as part of a system development process--then simply discarded after that), they must be able to be improved and expanded over time, as well as revised to reflect changes in the domain they model. Reasoning capabilities have a role in both the knowledge acquisition (read, model building) process and in the on-going maintenance of the model. As new contents are entered in a model, those contents should be analyzed in light of the existing contents. Conflicts or potential conflicts should be identified. It is presumed that for the foreseeable future, humans will be required to actually resolve any model conflicts. On an on-going basis, reasoning capabilities can also be used to predict new facts about the enterprise based on existing model contents, to analyze instance-level data and compare it to meta (schema)-level types and rules to test the validity of the model,and to identify areas where the model is possibly incomplete or inaccurate, or where the domain being modeled has changed since the model was built.

Finally, an important objective for producing enterprise conceptual models is to serve as the basis for future intelligent applications in the fields of data mining, decision-support and management automation. While some capabilities, such as data mining, may rely on neutral network approaches or nonmonotonic reasoning capabilities, even in such cases the existence of a starting candidate model that can serve to bound or guide refinement efforts could prove helpful. Additionally, some future application systems may use semantic models as direct sources for their information and processing rules--in the extreme case the conceptual model itself could be executed directly as an application system.

Observations and Recommendations

Based on a brief stocktaking of current standards efforts using the categories described above, some general observations and recommendations can be made. The observations and recommendations are intended as input to the discussions at the workshop, particularly to identify areas where standards projects relate, or should relate, to one another. Concern has been expressed in some communities about overlapping, redundant and even conflicting standards projects. The author believes that while there is some direct overlap, more often than not the projects can placed relative to one another within an overall framework of enterprise modelling/KR standards. They can be shown to be heading at different speeds and with specialized thrusts toward an overall goal that reflects a combination of the key categories described above. The overall goal is consistent with the notion of a three-schema architecture for information management, as envisioned by ANSI/X3/SPARC in Tsichritzis and Klug, 1978 and the notion of a conceptual schema as described in ISO/IEC TR9007, 1987 and ANSI/X3 TR-14 Part 1, 1995. Given the scope and complexity of enterprise modelling, it is to be expected that many methodology and technology standards related to KR (including acquisition, conceptual representation, integration, storage, distribution and presentation) would need to be applied in various combinations and to differing degrees to provide meaningful solutions to requirements in this area.

Syntactical Interchange of Heterogeneous Modelling Forms

Modelling forms that excel in their specialized focus must also be able to be integrated with other [heterogeneous] modelling forms to produce a unified model, or at least to interchange semantics that have meaning in more than one modelling context. This requires syntactical interchange as a first step. The syntactical level of interchange, as well as some static semantics has been largely addressed within particular methodologies (by standards such the IEEE IDEF Interchange Definition Language), as well as among major groups of methodologies (by standards such as ISO/IEC JTC1/SC7 WG11 CASE Data Interchange Format or CDIF, ISO/IEC CD? xxxxx, 1996. The CDIF meta-meta-model uses an E-A-R/database schema-like approach to represent the syntactical constructs and some model semantics for several commonly used enterprise modelling languages. Data/information modelling languages have been the primary focus, although some work has been performed in the area of process/dynamic modelling (for example, there is a meta-model for Data Flow Modelling or DFM). The meta-models are voluminous (e.g., 263 pages for E-A-R and 151 for DFM). They are specified using both natural language text and the semi-formal E-A-R/database schema specifications. The axioms or rules are given only in natural language, however, as there is currently no constraint or rule-specification language utilized for CDIF standards. Given the sheer size of the specifications, it is difficult to pick out the critical material. With limited formal specification, and again in light of the size of the meta-models, checking for closure and completeness is also a difficult task. The database schema specifications, however, do offer an API-like mechanism for exchanging models between CASE tools. A large amount of work has gone into the creation of the meta-models and they offer a good starting basis for future work.

The initial level of model semantic interchange is based on a mapping of common modelling constructs using the meta-meta-model as the interchange mechanism or bridge. Here again, the CDIF approach provides a database schema or repository perspective on enterprise model exchange, rather than a complete conceptual schema perspective on enterprise model integration. The CDIF meta-meta-model does not support full, formal description of modelling method semantics and does not support general expressions of complex mapping rules (using, for example, a logic-based constraint language). This is partly attributable to the meta-meta-model constructs themselves, but perhaps more so to the E-A-R expressive formalism used.

Extensions to the meta-meta-model would be needed for representing more complex modelling method semantics, particularly in the areas of process/dynamic semantics and constraints or rules, and for representing and integrating the semantics of domain ontologies. These areas are both within the scope of the ISO/IEC JTC1/SC21 WG3 Conceptual Schema Modelling Facilities (CSMF) project. Liaison activities are on-going between SC7 WG11 CDIF and SC21 WG3 CSMF. It is recommended that this liaison activity be continued and if possible, taken to a deeper technical level. CSMF has explored the use of symbolic logic-based languages as more powerful, general expressive formalisms. This is especially useful for explicitly representing rules or constraints. Although the draft CSMF standard, ISO/IEC WD 14481, 1996, does not specify any particular concrete syntax, symbolic logic was a driving requirement for the CSMF standard. It is expected that several of the concrete conceptual schema languages claiming conformance to the CSMF standard (for example, ANSI standard Conceptual Graphs or CGs, ANSI X3 1059-D, 1995, and ANSI standard Knowledge Interchange Format or KIF, ANSI X3 1058-D, 1995) will be based on symbolic logic. Therefore, it is expected that conforming CSLs will be useful for expressing the semantics necessary for formalizing and extending the CDIF meta-models.

Distributed Systems Management Across Intra- and Internets

Distribution of enterprise models is necessary across intra- and internets. The location and source of the models is no longer centralized and can no longer be predetermined. Methodological and technological standards to support basic physical distribution, in terms of protocols for modelling tool-to-modelling tool messaging and APIs between model-based applications running on networks, is being addressed largely through consortia and vendor-supplied standards such the Object Management Group (OMG)'s Common Object Request Broker Architecture Interchange Definition Language (COBRA IDL), OMG, 1995(a), Microsoft Corporation's Object Linking and Embedding (OLE) and other related object-based networking approaches. Export/import of models stored in relational database management systems (DBMSs) is addressed in ISO/IEC JTC1 SC21 WG3 Reference Model for Data Management (RMDM), ISO/IEC IS 10032, 1995, as well as various projects under the WG3 Export/Import Rapporteur Group.

Wrapper protocols or services specific to enterprise modelling language interchange are not currently the subject of special standards, and it remains to be determined if there are any special needs in this area not being met by the general standards noted above. There has been some government-funded R&D work in the U.S. (e.g., the Knowledge Query and Manipulation Language or KQML), but this work is not currently proposed as part of any standard being developed by ISO or any similar accredited standard bodies. Distribution management is the subject of much research and development, particularly as regards maintaining consistency and providing roll-back/recovery across sites sharing distributed databases. The two major ISO data dictionary/repository standards projects, which are described in the section on registration and reuse of domain models below, also address aspects of this issue by specifying configuration management and versioning capabilities, as does the Common Facilities Architecture, OMG, 1995(b), work being conducted by the OMG consortium. Again, no particular standards in this area are addressing distribution management issues for enterprise modelling languages. To the degree that modelling in specific and knowledge representation in general are closely related to distributed database management and data dictionaries/repositories (in terms of the information technologies used to implement automated modelling capabilities), it is again quite possible that no additional or specialized capabilities are needed beyond those addressed under existing standards projects.

Natural Language Processing

Natural language allows domain experts (i.e., 'users') to describe an enterprise in a way that appeals to them as non-IT professionals, as well as enabling models represented in specialized technical formats to be presented in formats more oriented to their application in a business context. On the whole, however, this field has suffered from the problem of trying to take on too much at once. Many of the efforts have concentrated on natural language interpretation, which is an extremely difficult, multi-disciplinary area rife with problems that are far from being solved. It may well never be automated to as large a degree as many would like. In any case, it is premature for standardization in that particular area.

In general, natural language processing is not the subject of standardization, but remains largely an area of research. Notable exceptions involve modelling languages which have a natural language like or stylized natural language syntax, or the syntax of which can be mapped to the grammatical elements of natural languages. Conceptual Graphs is one such language being standardized. Currently it is the subject only of an ANSI standard, ANSI X3 1059-D, 1995, but internationalization is currently being explored. CGs is a symbolic logic-based language that expresses enterprise model contents as conceptual structures. CGs can be created from natural language expressions (this is a human-intensive process, however) and the conceptual structures in a CG model can be mapped to natural language elements--meaning CG models can be presented as stylized natural language expressions. NIAM is another modelling language that models facts about the enterprise as stylized natural language sentences, or as graphical models that can be expressed as sentences. While there is a large body of documentation about NIAM from a practitioner's perspective, it is not currently the subject of a standardization by ISO or a similar accredited standards body, nor by any known private consortium.

There is also considerable work in academia and some in the software vendor industry in the field of linguistic ontologies. The contents of these ontologies are the terms most commonly used in natural languages. Linguistic ontologies are useful as starter sets or building blocks when producing specialized domain ontologies. They may also be useful for natural language parsing and interpretation. Examples of linguistic ontologies include the Pangloss Upper Model from Dr. Ed Hovy at the University of Southern California Information Sciences Institute (USC ISI), WordNet from Dr. George Miller at Princeton University, portions of the CYC ontology from Dr. Doug Lenat et al at Cycorp, and the EDR Electronic Dictionary from the Electronic Dictionary Research Institute, Ltd. in Japan. There are currently no national or international standards projects in this area. ANSI ASC X3, Technical Committee T2, is currently formulating a project proposal and performing preliminary standardization work for a U. S. domestic project in this area (to be conducted under ANSI ASC X3).

Registration and Reuse of Domain Models (Concept Libraries or Domain 'Ontologies')

The major objective in this area is provide registries of formal and informal ontologies available for access in the public or private domain and for use in various levels of semantic model building and integration (i.e., model mapping, alignment and unification). This presupposes methods and tools for knowledge representation, including model development and model management. It is anticipated that such methods and tools may be applied on a stand-alone basis, as well as within the implementation context of CASE tools and data dictionaries/repositories. It also presupposes standards for the specification of contents in domain ontologies.

CASE Tools and Data Dictionaries/Repositories

The two major international standards efforts in the data dictionary/repository field are ISO/IEC JTC1/SC21 WG3 Information Resource Dictionary System (IRDS), ISO/IEC IS 10027, 1992 and ISO/IEC IS 10728, 1994 and ISO/IEC JTC1/SC22 WG22 Portable Common Tool Environment (PCTE), ISO/IEC DIS 13719, 1994. The former reflects the viewpoint of the structured data/DBMS community--at this point this is basically synonymous with the relational DBMS community. There are few, if any, commercial products conforming to this standard. The latter reflects the viewpoint of the software development/programming languages community, with somewhat of a general object-oriented flavor (in fact, the PCTE efforts are being coordinated to at least some degree with OMG's work in object-oriented technologies, including repositories). As of December 1994 there were approximately half a dozen commercial products conforming to this standard, although many of these products were still at least partially under development. It is not clear, however, that there is major vendor or industry support for this standard in its current form, so much as support for its stated future direction and its close working relationship with the OMG consortium. In any case, the two major international data dictionary/repository projects need to be closely coordinated to ensure that the structured data/DBMS community and the software community have compatible standards. Both need to ensure that the needs of the scientific and technical data community are also taken into account in international data dictionary/repository standards.

A major concern in this area is that data dictionary/repository standards should not be dictating the formats of their content models at the level of specific implementation approaches; rather they should specify acquisition, naming, storage, retrieval, versioning, configuration management and other information resources management services in a manner neutral (i.e., independent of) any particular implementation approach. The fact that the current IRDS standard is so closely coupled to a single DBMS approach (i.e., relational) is an aspect that limits its wider-spread acceptance in some potential user communities. The PCTE standard is less coupled to particular implementation technologies. Although it does have an object-oriented flavor, object-oriented in this sense is not concerned with the mechanics of the object-oriented programming and database paradigms, but rather with the basic notion of treating contents stored in a repository as objects. This approach lends itself to storage of non-traditional contents such as that associated with multi media. The possibility of the IRDS and PCTE projects being more closely coordinated or even joining forces would be greatly enhanced if both were to be as neutral as possible about target implementation technologies.

The ISO/IEC JTC1/SC7 WG11 CDIF project, ISO/IEC CD??? xxxxx, 1996, is the major standards project associated with CASE tools, although there are several smaller industry efforts focusing on specialized aspects of CASE tools. Since CASE tools would be expected to utilize data dictionary/repository services or perhaps even to be combined with data dictionary/repository technologies as part of a complete systems engineering environment (SEE), the CDIF project and the IRDS and PCTE projects should work closely together.

General Knowledge Representation and Ontology

The only major standards project addressing the issue of knowledge representation and ontology (in general) is the ISO/IEC JTC1/SC21 WG3 Conceptual Schema Modelling Facilities (CSMF) project, ISO/IEC WD 14481, 1996. It does not simply deal with model form/syntax or basic model semantics, but rather with the common underlying constructs used to represent all entities or objects, processes and constraints or rules. The current effort admittedly only addresses an initial starter set of these constructs. As important perhaps as the specific normative constructs specified in this initial CSMF standard is the overall approach reflected in the standard. The standard itself may be able to serve as a framework for future KR and ontology standards.

The CSMF work is based on the ANSI/X3/SPARC three-schema architecture, Tsichritzis and Klug, 1978. De Witt, 1992 notes that the basic ideas of the three-schema architecture--namely the segregation of the meaning of data in a database from the form in which it is implemented and from the form in which it is presented to users through applications--have been around for a number of years. The IBM GUIDE users group published ideas of this nature in GUIDE, 1971. This was the precursor that led to the standard architecture published by ANSI/X3/SPARC. The ideas are currently being 'rediscovered' by a number of communities. At the heart of the architecture is the conceptual schema, which is an enterprise model expressed in some formal knowledge representation scheme. Mappings are provided to the presentation forms of the external schema and to the implementation forms of the internal schema. A conceptual schema (CS) must adhere to the fundamental principles described in ISO/IEC TR9007, 1987, as incorporated and expanded in ISO/IEC WD 14481, 1996. These principles include:

100% Principle - The CSMF enables the production of CSs that obey the "100% Principle" of ISO TR9007, i.e., all relevant static and dynamic rules, laws, etc. about the universe of discourse should be described in a CS.

Conceptualization Principle - The CSMF enables the production of CSs that obey the "Conceptualization Principle" of ISO TR9007, i.e., a CS should only include conceptually relevant aspects, both static and dynamic, of the universe of discourse. All aspects of external or internal data representation are to be excluded. In particular this enables the production of a CS which is independent with respect to physical implementation technologies and platforms.

Helsinki Principle - The CSMF enables the production of CS's that obey the "Helsinki Principle" i.e., any meaningful exchange of utterances depends upon the prior existence of an agreed upon set of semantic and syntactic rules. The recipients of the utterances must use only these rules to interpret the received utterances, if it is to mean the same as that which was meant by the utterer.

Ontology - The CSMF allows distinction between the concept and the representation of the concept.

Nature of the World - The CSMF makes minimal assumptions concerning the nature of the world. Only very fundamental ontological and representational constructs are found in the basic CSMF starter set of normative constructs. This allows the majority of ontological content to be introduced into the CS through specific domain models. The marketplace can choose among these domain models, 'plugging and playing' the most appropriate ones given particular purposes or uses.

Concrete Syntaxes - The CSMF contains constructs to enable easy mapping to and from concrete syntaxes. The CSMF standard itself does not specify a conceptual schema language (either abstract or concrete), but some requirements that concrete conceptual schema languages will have to meet are specified. The CSMF standard does specify the basic, minimal set of normative semantic constructs that concrete CSLs will be expected to be able to represent.

Ease of Understanding - The CSMF enables the presentation of CS contents in a way that can easily be understood by humans knowledgeable about the UoD. This is again at the level of the kinds of things to be found in a CS, not the particular concrete syntaxes used to present CSs. Which concrete syntaxes lend themselves best to domain expert (i.e., 'user') presentation is left to the marketplace to decide.

Extensibility - The CSMF provides mechanisms for extending the standard set of constructs. Conforming concrete conceptual schema languages (CSLs) need not limit their constructs to just the starter set of normative constructs specified in the CSMF standard. While various levels of conformance to the normative constructs may be permitted, concrete CSLs will not be prohibited from offering additional constructs beyond the normative CSMF constructs. However, any additional constructs must not be in conflict with the CSMF normative constructs, and all additional constructs must be derivable from either the normative constructs themselves, or from the normative constructs plus any previously-defined additional constructs.

Self Description - The CSMF complies with the "Meta Principle", i.e., the constructs defined in the CSMF standard are capable of self-description (with the exception of any constructs declared to be primitive). The normative constructs are also internally-closed and consistent.

Finally, it should be noted that there at two primary objectives for the CSMF standard:

To enhance interoperability between enterprise models (i.e., conceptual schemas), and

To improve the quality (e.g., accuracy, completeness, semantic depth, etc.) of enterprise models, and correspondingly of information systems developed based on the models.

Both of these objectives are important. Fortunately, the requirements of both objectives, in terms of CSMF capabilities, as well as the approaches to providing those capabilities, overlap to a large degree. By providing ways to both integrate existing enterprise modelling methods and to enrich or extend conceptual modelling capabilities through improvements in knowledge representation, the CSMF project is working towards the achievement of both objectives concurrently.

Domain Model Contents

The major tasks in this area are to: decide what contents should be standardized; organize the contents that are deemed suitable for standardization; standardize the manner (but not the form or mechanics) by which such contents should be specified; and develop the content standards, i.e., produce the actual domain models. Decisions about which contents to standardize are made by international trade standardization groups such as the United Nations EDIFACT. The ISO/IEC JTC1/SC30 Open Electronic Data Interchange (EDI) project deals with the overall standardization of those key international trade and commerce data elements chosen for standardization. Its reference model, ISO/IEC CD 14662 Part 2, 1996, explains its scope and purpose. The manner of standardization is specified by ISO/IEC JTC1/SC14 Data Element Standardization, which has this task as its SC charter. A major international standard, ISO/IEC IS 11179, 1995, has been produced by SC14 addressing this subject. The UN/ISO Basic Semantic Repository (BSR) project also addresses the manner in which domain model contents in the field of EDI--for example, trade data element names--should be specified. It is described by some as a partial implementation of IS 11179, also providing specific contents relevant to the field of EDI. However, there is concern that this project crosses over into the form and mechanics of specification, going beyond the scope of IS 11179 and into the field of semantic modelling. The form and mechanics of semantic specification should be left to other standards projects concerned specifically with knowledge representation--the ISO/IEC JTC1/SC21 WG3 CSMF project and related standardization projects for concrete enterprise modelling languages under ISO/IEC JTC1/SC7 WG11. Industry user groups should not be specifying what are essentially information technology standards--standards that deal with complex knowledge representation and information management issues.

ISO TC184/SC4 STandard for the Exchange of Product model data STEP, ISO IS 10303, various, is the largest standards project targeted at producing a domain model or ontology for a major application area--in this case engineering and manufacturing product definition. As a forerunner in the enterprise modelling and KR/ontology fields, the STEP community has had to develop some of its own approaches and techniques in cases where there were none available to utilize at the time. The EXPRESS modelling language, ISO IS 10303 Part 11, 1993, is an example. However, now that a number of information technology standards projects are working on various aspects of model integration and conceptual modelling, the STEP community should support those generic efforts to ensure that STEP-specific needs and requirements can be adequately addressed directly by the more generic standards, or through specializations of the generic standards. This, in fact, appears to be the current direction of STEP, as a close liaison has been established with ISO/IEC JTC1/SC7 WG11 (e.g., the SEDDI project), and liaison acitvities have also been undertaken in conjunction with ISO/IEC JTC1/SC21 WG3 CSMF RG.

Inferencing and Intelligent Decision-Support

While inferencing techniques may be commonly found in specialized technical circles, practical application of the techniques in the form of commercial-off-the-shelf tools (e.g., inference engines) is lacking, except to support narrowly-focused experts systems software. What minimal commercial technology is available in the intelligent decision-support field uses straight-forward vector-based pattern-matching. It makes little or no use of inferencing rules specified at a meta schematic level. Data mining technology uses basically the same pattern-matching approaches, as well as fuzzy-logic approaches and neural networks.

Advances in the field of inferencing and intelligent decision-support depend largely upon semantic advances in modelling languages (particularly for rule specification) and in domain ontologies (for the meta-level contents against which to perform inferencing and against which instance-level patterns identified in the information base can be compared). Advances also depend upon implementation environments in the form of CASE tools and data dictionaries/repositories. Current standards in the field of CASE tools and data dictionaries/repositories specify nothing in the area of inferencing and provide little or no support for incorporating such capabilities into IT tools in those fields.

References:

ANSI X3/TR-14 Part 1, 1995. IRDS Conceptual Schema, Part 1 Conceptual Schema for IRDS. American National Standards Institute (ANSI) Accredited Standards Committee X3 Information Processing Systems.

ANSI X3/TR-14 Part 2, 1995. IRDS Conceptual Schema, Part 2, Modeling Language Analysis. American National Standards Institute (ANSI) Accredited Standards Committee X3 Information Processing Systems.

ANSI X3 1058-D, 1995. SD-3 (Approved) Proposal to Develop a New X3 Standard on Knowledge Interchange Format (KIF). American National Standards Institute (ANSI) Accredited Standards Committee X3 Information Processing Systems.

ANSI X3 1059-D, 1995. SD-3 (Approved) Proposal to Develop a New X3 Standard on Conceptual Graphs (CGs). American National Standards Institute (ANSI) Accredited Standards Committee X3 Information Processing Systems.

De Witt, S., 1992. "Three-Schema Enterprise Modeling", in Pro ceedings of the IDEF Users' Group Conference.

GUIDE, 1971. GUIDE Secretary Distribution (GSD) 23: "Requirements for a Data Base Management System". GUIDE International Corporation.

ISO IS 10303, various. Standard for the Exchange of Product model data (STEP) [a multi-part standard]. International Organization for Standardization, Technical Committee 184 Industrial Automation Systems, Subcommittee 4 Industrial Data and Global Manufacturing Languages (ISO TC184/SC4).

ISO IS 10303 Part 11, 1993. Industrial Automation Systems - Product Data Representation and Exchange, Part 11: Description Methods: The EXPRESS Language Reference Manual. International Organization for Standardization, Technical Committee 184 Industrial Automation Systems, Subcommittee 4 Industrial Data and Global Manufacturing Languages (ISO TC184/SC4).

ISO/IEC CD 14662 Part 2, 1996. Information Technology - Open-edi Reference Model, Part 2. International Organization for Standardization (ISO), Standing Committee 30 Open-edi.

ISO/IEC DIS 13719, 1994. Information Technology - Portable Common Tool Environment (PCTE), Parts 1-3. International Organization for Standardization (ISO), Standing Committee 22 Programming Languages, Working Group 22 Portable Common Tool Environment (PCTE).

ISO/IEC DIS xxxxx, 1996. Information Technology - CASE Data Interchange Format (CDIF). International Organization for Standardization (ISO), Standing Committee 7 Software Engineering, Working Group 11 Data Definition and Representation.

ISO/IEC IS 10027, 1992. Information Technology - Information Resource Dictionary System (IRDS) - Framework. International Organization for Standardization (ISO), Standing Committee 21 Open Systems Interconnection, Data Management and Open Distributed Processing, Working Group 3 Database, Information Resource Dictionary System Rapporteur Group (IRDS RG).

ISO/IEC IS 10032, 1995. Information Technology - Reference Model of Data Management. International Organization for Standardization (ISO), Standing Committee 21 Open Systems Interconnection, Data Management and Open Distributed Processing, Working Group 3 Database, Reference Model of Data Management Rapporteur Group (RMDM RG).

ISO/IEC IS 10728, 1994. Information Technology - Information Resource Dictionary System (IRDS) - Services Interface. International Organization for Standardization (ISO), Standing Committee 21 Open Systems Interconnection, Data Management and Open Distributed Processing, Working Group 3 Database, Information Resource Dictionary System Rapporteur Group (IRDS RG).

ISO/IEC IS 11179, 1995. Information Technology - Specification and Standardization of Data Elements, Part 1-5. International Organization for Standardization (ISO), Standing Committee 14 Data Element Standardization.

ISO/IEC TR9007, 1987. Information Processing Systems - Concepts and Terminology for the Conceptual Schema and the Information Base. International Organization for Standardization (ISO).

ISO/IEC WD 14481, 1996. Information Technology - Conceptual Schema Modelling Facilities (CSMF). International Organization for Standardization (ISO), Standing Committee 21 Open Systems Interconnection, Data Management and Open Distributed Processing, Working Group 3 Database, Conceptual Schema Modelling Facilities Rapporteur Group (CSMF RG)

OMG, 1995 (a). CORBA 2.0/Interoperability: Universal Networked Objects. Object Management Group, Inc.

OMG, 1995 (b). Common Facilities Architecture. Object Management Group, Inc.

Sarris, 1992. Integration Toolkit and Methods (ITKM) Corporate Data Integration Tools (CDIT), Review of the State-of-the-Art with Respect to Integration Toolkits and Methods (ITKM). WL-TR-92-8048/DTIC AD-A255 547. Air Force Manufacturing Directorate, Wright Laboratory, Wright-Patterson Air Force Base.

The Object Agency, 1995. A Comparison of Object-Oriented Methodologies. The Object Agency, Inc.

Tsichritzis, D. and Klug, A. (Eds.), 1978. "The ANSI/X3/SPARC DBMS Framework", Information Systems, Vol. 3, 173-191.

Wakeman, L. and Jowett, J., 1993. PCTE: The Standard for Open Repositories. Prentice Hall.

X3H7, 1995. (Draft) Technical Report, Object Model Features Matrix. American National Standards Institute (ANSI) Accredited Standards Committee (ASC) X3 Information Processing Systems, Technical Committee H7 Object Information Management.

About the Author

Anthony K. Sarris is vice president of enterprise analysis for Ontek Corporation, a software research, development and consulting firm located in Laguna Hills, California, U.S.A. In his role at Ontek, Mr. Sarris acts as a project manager and technical specialist, analyzing information requirements, defining information architectures, and developing enterprise models for commercial and government organizations--primarily in the fields of engineering, manufacturing and information technology. Mr. Sarris has been involved with national and international standards development projects in the areas of modelling and data dictionary/repository since 1991. He co-edited ANSI X3/TR-14:1995, the Information Resource Dictionary System (IRDS) Conceptual Schema Technical Report. He is currently serving as chairman of ANSI ASC X3 Technical Committee T2, Information Interchange and Interpretation, and participates as a U.S. expert in the ISO/IEC JTC1/SC21 WG3 Conceptual Schema Modelling Facilities (CSMF) project.

Mr. Sarris is a Phi Beta Kappa graduate of Purdue University and also studied at the University of Hamburg, Germany. Prior to joining Ontek Corporation, he held technical and management positions at a large aerospace and defense contractor and a Big Six technology-consulting firm. He is a frequent speaker at professional conferences including: IEEE Meta Data, Technology of Object-Oriented Languages and Systems (TOOLS), Continuous/Computer-Aided Acquisition and Logistical Support (CALS), Manufacturing Technology Advisory Group (MTAG)/Agile Manufacturing, the IDEF Users Group, and the National Council on Systems Engineering/American Society of Engineering Management (NCOSE/ASEM).


Send message to: tony@ontek.com, (Tony Sarris), or nell@nist.gov, (Jim Nell) Workshop secretary.

Return to: JSW Home Page.