NIST/EPA/NIH Mass Spectral Enhancements - 1998 version (NIST98)


By O. David Sparkman

Evaluated and Expanded for Quality

Figure 1. The NIST Mass Spectral Search Program with all seven of its Windows displayed.

Added Features for Quality

Prior to 1998, it had been six years since NIST released its last version of the NIST/EPA/NIH Mass Spectral Library. During that period, NIST has completed a ten-year project to completely evaluate the Library. As this process progressed, NIST was able to generate a number of spectra and acquire several important collections of quality spectra. This has allowed the Library to increase in size to 129,136 spectra of 107,886 compounds. This is a 75% increase in coverage. All but 57 of the compounds have an associated structure.

In an effort to develop the Library that has been optimized for the identification of unknown compounds through their mass spectra, NIST undertook a program, using experienced mass spectrometrists, where complete-as-possible spectra were evaluated for the presence or absence of peaks based on the structure, empirical formula, and molecular weight of a compound. In the event that anomalies were found,

Figure 2. MS Interpreter - a tool within the NIST MS Search Program that allows for a number of different functions including the ability to calculate the distance between peaks and determine if mass spectral peaks are logical based on an associated structure.

decisions as to what should be done with a spectrum (delete from the Library, remeasure (always, where possible), or remove contaminant peaks) were reached and agreed to by two of these mass spectrometrists. This process has led to thousands of selections, deletions, and modifications to produce this optimal Library. The process of the development of the Library has been described in numerous presentations at the American Society for Mass Spectrometry meetings over the past few years.1,2,3

This new evaluated Mass Spectral Database is the NIST98 Mass Spectral Library. It is distributed along with the NIST Mass Spectral Search Program for Windows. This allows the use of either the search routines from within the mass spectral software of many manufacturers or the NIST MS Search Program itself. The NIST MS Search

Program has been described in numerous publications.4,5,6

In addition to the NIST98 Database, users can also build their own libraries and have structures associated with the spectra. The Wiley 6 Registry of Mass Spectral Data is now available in the NIST format.

The NIST98 MS Search Program allows for many different Desktop configurations to be set by the user for displaying the results of the many different ways to search the NIST98 Library. Not only can unknown spectra be searched against this evaluated Database, but the Database can also be searched using incremental names of synonyms of compounds, by Chemical Abstract Services (CAS) registry numbers, empirical formula, molecular weight, and the identification number given in the Library. In addition, the Database can be searched based on data input as to the m/z value, relative or absolute intensity or type (normal, neutral loss, intensity rank in the spectrum, or whether or not it represents the maximum m/z in the spectrum) of peak. Molecular weight, unknown spectra, and peak searches can be constrained as to what elements are present as well as how many or a range of the atoms of each element can be present in a retrieved spectrum. Unknown spectra and peak searches can be constrained as to an allowable molecular weight range. Searches can also be constrained to retrieve only spectra of those compounds that are also listed in other specified databases such as those maintained by the EPA or NIH.

One of the new features in the NIST MS Search Program, V 1.6, is the ability to include user-generated structures in the form of MOL files in User libraries. This feature has been a part of the HP ChemStation, but it has now been improved so that implicit hydrogens associated with functional groups are displayed.

New Features in the NIST MS Search Program

Two new features of the NIST MS Search Program are AMDIS (Automated Mass Spectral Deconvolution and Identification System) and a very unique routine that will aid in spectral evaluation and interpretation - MS Interpreter.

AMDIS will read and display GC- and LC-MSD data files from most popular instrument data systems. The files are evaluated on the basis of spectral uniqueness. Unique spectra (with contaminating peaks eliminated - deconvoluted spectra) are compared against target libraries or are sent to the NIST MS Search Program for identification. AMDIS is provided with individual target libraries (all derived from the NIST98 Library) for use with environmental, drugs of abuse, toxicological, and flavor/fragrance applications. The libraries can be expanded, and User libraries can be built from chromatographic/mass spectral data or additional spectra from the NIST98 Library or other libraries in the NIST MS Search Program format. Additional information on AMDIS appear in a future applications note in this newsletter.

Just one of the many features of MS Interpreter is that it provides an enhancement of the popular ISOFORM utility included with previous versions of the NIST MS Search Program. This is used to calculate and display (graphical and numerically) isotopic patterns based on inputted formulas and to produce formulas for neutral fragments and ions based on molecular formula, elemental constraints, and/or m/z values of ions and neutrals.

In addition to all the functions of ISOFORM, MS Interpreter allows for a graphical comparison of observed and theoretical isotopic patterns, the ability to use a graphic tool to determine and display the m/z difference between a designated precursor peak and another peak, and, based on a simple single-bond cleavage presumption, the display of the portion of a molecular structure represented by individual peaks in the mass spectrum. This feature is a result of Robert Mistrik's Cluster Analysis research reported at the 1997 Palm Springs ASMS meeting. This powerful utility just adds to the ability of using the NIST MS Search Program and the NIST98 Library in the identification of compounds whose spectra are not in the Database.

The NIST MS Search Program, V 1.6, is still provided with three search algorithms (the Identity Search for spectra of compounds whose spectrum is probably in the Library, and the Similarity and Neutral Loss Searches for spectra of compounds whose spectrum is probably not in the Library) that has made it such a widely used utility in mass spectrometry laboratories. This later feature, combined with Substructure Identification, is one of the factors that is being extensively used in the evaluation of APCI and ESI LC/MS spectra obtained by in-source collisionally activated dissociation (CAD) or MS/MS.

What IS Quality?

At issue, under many circumstances, is what is meant by the word "quality." This word has often been used as a size-comparative measure when it comes to mass spectral libraries. In the past, the basis for judgment has been the number of spectra; however, this has changed with the NIST98 Library. The NIST98 Library's total number of peaks/average number of peaks/median number of peaks values (10,033,398/93/78) are far greater than those of the only other large spectrum number/mass spectral library (the Wiley Registry of Mass Spectral Data - 8,087,622/35/10)6.

Figure 3. Two of the many views of AMDIS. Both target and nontarget analytes can be pulled from very complex reconstructed total ion current chromatograms. This is one of the most thoroughly tested programs ever developed for use with LC/MS and GC/MS data.

As can be easily imagined, with the numbers of spectra in the tens of thousands, the possibility of duplicating a given spectrum is very possible. This has been a problem with all previous mass spectral libraries. The only way to assure that this is not a factor is to have some unique identifier associated with each unique spectrum. This is best accomplished by the use of a CAS registry number or a structure. The NIST98 Library has a larger percentage of spectra with a unique identifier than any other mass spectral library that has been distributed. Of all the compounds in the NIST98 Library, 99.95% have an associated structure. These unique structures were compared using one of the many in-house-developed software programs utilized by NIST to assure the highest quality. Remember, there is a difference between replicate spectra (multiple spectra of the same compound from different sources) and duplicate spectra (the same spectrum presence in the library with different index numbers).


In combination, the NIST98 Library and the NIST MS Search Program represent one of the most powerful tools for the mass spectrometrist. As with any power tool, there are a lot of features that require training to fully implement.


1. P. Ausloos, C. Clifton, S. Lias, A. Mikaya, S. Stein, D. Sparkman, D. Tchekovskoi, V. Zaikin, D. Zhu "The Critical Evaluation of a Comprehensive Mass Spectral Library," J. Am. Soc. Mass Spectrom. 1999, 10, 288-299.
2. V. Zaikin, P. Ausloos, C. Clifton, S. Lias, A. Mikaya, D. Sparkman, S. Stein "The Evaluation of a Comprehensive MS Reference Library," Proceedings of the 43rd ASMS Conference on Mass Spectrometry and Allied Topics, Atlanta, GA, 1995.
3. S. Stein "Estimating Probabilities of Correct Identification from Results of Mass Spectral Library Searches," J. Am. Soc. Mass Spectrom. 1994, 5, 316-323.
4. S. Stein, D. Scott "Optimization and Testing of Mass Spectral Library Search Algorithms for Compound Identification," J. Am. Soc. Mass Spectrom. 1994, 5, 859-866.
5. S. Stein "Chemical Substructure Identification by Mass Spectral Library Searching," J. Am. Soc. Mass Spectrom. 1995, 6, 644-655.

These figures are based on reprint of a paper distributed by F. W. McLafferty et al. in conjunction with a Poster Presentation at the 45th ASMS meeting in Palm Springs, CA, 1997.