Skip navigation SIMASystems Integration for Manufacturing Applications NIST - National Institute of Standards and Technology
ToolsPublicationsPublicationsResearch ProjectsAbout SIMAContactHome  
 
Technical Research Projects
Physical and Chemical Property Data Interchange Standards
Principal Investigator: Peter Linstrom
(301) 975-5422
peter.linstrom@nist.gov

Objective:
To continue to develop and standardize tools to aid in the interchange of chemical reference data over the Internet. During fiscal year 2006, this project will focus on applications of the newly standardized IUPAC International Chemical Identifier (InChI). InChI is a standard for identifying chemical species, traditionally a major problem in chemical informatics. This project will work in conjunction with IUPAC project 2004-039-1-800, IUPAC International Chemical Identifier (InChI): promotion and extension, to promote and extend the application of the identifier.

Background:
The initial aim of this work was to make NIST chemical reference data available over the Internet, with an emphasis on ease of access for individual users. To accomplish this, a system was developed that provided a convenient user interface, making it easy for individual scientists, engineers, and educators to get access to the NIST chemical reference data. This effort has been quite successful. The NIST Chemistry WebBook is used by scientists, engineers, and educators worldwide. Usage continues to grow over time. During the non-summer months, the site now averages over 1.6 million page views per month and 22,000 distinct hosts visiting per week.

In the past four years, however, the focus of this project has moved from publishing data in human readable form to enabling data interchange among automated systems. The reasons for this shift are two-fold: to help NIST address its customer needs through data publishing techniques suitable for automated systems and to help other (commercial) organizations exchange data in a similar manner. The major obstacle to both efforts is the lack of standards for chemical data interchange. This is the area were recent work has focused and the area targeted by this proposal.

 Recent work on this project has included efforts in conjunction with IUPAC to develop data dictionaries for the chemical sciences. This work is now largely complete and the data dictionaries should soon be available on the IUPAC web page.

 As noted above this work will help to implement and extend the IUPAC International Chemical Identifier (InChI). InChI is a representation of a chemical species based in the connectivity and geometric configuration of atoms in the molecule. Use of the identifier simplifies identification of chemical species by automated systems, since it requires no third party registries or complex nomenclature systems. Registries are problematic because their limited coverage and intellectual property issues associated with their use. Use of chemical nomenclature can be difficult because of the existence of multiple names for chemical species. In addition, human determination of names for complex chemical species can be error prone. InChI takes information present in a molecular structure and converts it to a canonical form. Such conversion requires a computer program, but results in an identifier which does not require a registrar and does not have potential for errors in traditional naming systems. More information on InChI can be found on the IUPAC web site (http://www.iupac.org/inchi/).

Since its standardization in April 2005, InChI has been adopted by two of the largest public chemical data sites on the Internet, NIH’s PubChem (http://pubchem.ncbi.nlm.nih.gov/) and NIST’s Chemistry WebBook (http://webbook.nist.gov/chemistry/). It has also been incorporated in structure drawing software developed by Advanced Chemistry Development, Inc. The United States Patent and Trademark Office has indicated that it is considering the use of InChI for the identification of chemical species in patent submissions (Federal Register, volume 70, number 118, June 21, 2005, pages 35573 – 35577).

 InChI plays an important role in development of a modern infrastructure for the interchange of chemical property data. Interchange of such data requires the identification of two important items:

  • The system to which the property data applies. This may be a well defined molecular species, a poorly defined molecular species, a mixture of species, or a reaction specification (equivalent to two mixtures).

  • The data which applies to the species.

 This project seeks to help resolve problems in item one above, traditionally a major problem in chemical informatics. Prior work on this project (XML data dictionaries) and work on two other SIMA supported projects, Units Markup Language (UnitsML) and Analytical Instrument Markup Language (AnIML) address problems in item two.

This project will work to promote the use of the identifier through the development of tools useful to users of the identifier. Much of this work has been completed and can be seen in the InChI display and search systems found on the NIST Chemistry WebBook. A prototype of website which provides diagnostic information on the identifier has also been developed. Such a service is useful since the InChI string is not readily human readable, but can be decoded into useful information by a computer. Remaining work on this effort includes completion of code for providing a graphical representation of the connectivity defined an identifier and integration with the existing NIST web site.

This project will also work to expand the scope of application of InChI to a wider range of chemical systems. This will be done in two ways, continuation of ongoing work to combine InChI strings to define mixtures and reactions and work with IUPAC to extend the identifier to other types of chemical species. Since InChI identifies a single species, it can form the basis for systems that consist of multiple species such as mixtures and chemical reactions. Development of identifiers for these types of systems will provide benefits for these systems similar to those of InChI for chemical species. Another way to expand the scope of the identifier is to increase the number of chemical species it can be used for. This project will work with IUCPAC project 2004-039-1-800 to add support for polymers, Markush structures, and phase information. The addition of phase information to InChI will aid efforts to use InChI to identify chemical mixtures and reactions.

This project is important to NIST and SIMA not only because it will aid in NIST’s efforts to distribute standard reference data, but also because it will provide the basis for standardizing many types of chemical data interchange. Identification of chemical species is important not only for chemical data interchange but also for operations and commerce in the chemical process industries. By working through an open standards process in IUPAC this project seeks to provide a wide benefit to scientists and engineers in the chemistry based sciences and industries.
 

  Back to list of all projects
 

Page created November 2005

  Last updated: Feb 02, 2006
 

Web site point of contact