 |
Data Standards for Structural Bioinformatics
Principal
Investigator: T.N. Bhat (301) 975-5448
talapady.bhat@nist.govObjective:
To develop standards for structural
bioinformatics for pre-clinical data that includes X-ray and NMR structural data
for biological macromolecules with particular emphasis to internet-based
databases of importance to biotechnology and Semantic Web. Background:
The last decade has seen an amazing
explosion in the field of bioinformatics fueled by the large-scale genomics
sequencing efforts funded by NIH, NSF, DOE and private industry. Recently, new
initiatives in Structural Genomics and Proteomics are underway that will be much
more data intensive, with both large volumes and more complex data
representations. Today the harvesting and management of large sets of biological
structural data, and the mining of the information contained therein, is an
activity that is transforming biological science, biotechnology, and the
pharmaceutical industry. In most cases, the amount of data is enormous:
thousands of macromolecular structures, millions of protein sequences, tens of
thousands of structural and sequence neighbors.
In this era of Bioinformatics, there is a growing need for specialized,
critically evaluated, and reliable data delivered using easy-to-use Web tools.
NIST has been the leader in such data activities. NIST has a long history of
producing, evaluating, and disseminating chemical data and is increasingly
applying this expertise to biosciences. Researchers who are either developing
drug treatments for AIDS or studying the virus that causes the disease has a new
resource - the HIV Structural Reference Database
- an online database of AIDS-related protein structures developed in part using
SIMA funds, unveiled for public use by NIST in 2004. Since its release it has
drawn considerable attention within NIST and it has quickly become one of the
most popular NIST databases.
This work has resulted in two major awards in 2006: (1) the NIST Judson C.
French award (2005); and (2) the Science Spectrum Trailblazer Award by Science
Spectrum Magazine (2006). In 2007 it resulted in another major award - Emerald
Honors by the Science Spectrum Magazine.
Developed in collaboration with the National Cancer Institute, the HIV
Structural Reference Database (HIVSDB) is receiving, annotating, archiving, and
distributing structural data for proteins involved in making HIV, the virus that
causes AIDS, as well as molecules that inhibit these activities. Until now, much
of this information was not widely available because it was unpublished. The new
database contains data from both the published literature and from direct
contributions by industrial and other laboratories.
The database (SRD No. 102, copyrighted by the Department of Commerce, as
mandated by the Congress) will be especially useful in developing strategies for
inhibiting the activities of the HIV protease that is essential for maturation
of HIV. In addition, the database is expected to help scientists understand and
circumvent the problem of mutations that make HIV resistant to certain drugs. It
is one of NIST’s most widely used databases.
NIST scientists annotate the structural data with information from various
sources and index or classify the entries so that users can reliably find
particular structures. NIST has helped to develop a novel technique for indexing
HIV protease inhibitors. This, in turn, has enabled scientists to rapidly and
reliably get data on all enzyme-inhibitor complexes such as a mutant strain that
is resistant to a particular drug. The HIV database is a model for developing
and testing new technologies to annotate and standardize HIV inhibitor names,
and for evaluating structural data for macromolecules. At present this Webpage
has the largest collection of 3-D structures of AIDS targets integrated with the
2-D structures of their inhibitors. In 2008 several updates were posted to this
Web page, one of the major update was the inclusion of about 1000 additional
protease inhibitors. During 2008, a significant amount of time was also spent on
security related updates for the HIVSDB and Enzyme Thermodynamics database.
Enzymes, the biological machinery of chemical catalysis, are key components of
technological and industrial growth using biological data. For this reason,
during the year 2007, using in part SIMA funds, NIST established a revised
data
resource for enzyme thermodynamics data (SRD No.74).
In 2008 a new database with emphasis to Bio-fuels was posted to the public. In
2009 additional info will be posted to this resource.
|