Life Sciences Databases for Bioinformatics

  • AGRICOLA (Agricultural sciences)
  • Biological Abstracts (BIOSIS) (Biological sciences)
  • Biological Sciences Set (Biological sciences)
  • MEDLINE (Medical science)
  • Technology and Computer Science Databases for BI

  • Chemical Abstracts (SciFinder) (Chemical Sciences)
  • Compendex (Engineering)
  • ACM Portal to Computing Literature
  • Computer Database
  • Genomics and Commercial Databanks

  • Nucleic Acids Research Database Categories List
    From the annual database issue of this important journal, updated each January. A very useful collection of links to a comprehensive list of over 300 resources. Includes summaries of each entry, with description, recent developments, authors responsible and a contact link
  • Biology Links
    This page includes links to a number of model organism databases, banks and tables, and to a number of genetic databases. Maintained by the Dept. of Molecular & Cellular Biology, Harvard University.
  • Databases for Molecular Biology
    Includes nucleotide databases, protein databases, chromosome maps, enzyme databases, etc.
    From the Computational Molecular Biology at NIH Web site.
  • Nucleotide Sequence Databases

  • GenBank: National Center for Biotechnology Information
    “GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences ( Nucleic Acids Research 1999 Jan 1;27(1):12-7). It is part of the International Nucleotide Sequence Database Collaboration , which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI.”
  • EMBL Nucleotide Sequence Database
    The EMBL Nucleotide Sequence Database is the European equivalent to the U.S.'s Gen Bank, found above and “constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications.”
  • DDBJ: DNA Data Bank of Japan
    The third in the trio of these major sequence databases, based in Japan's National Institute of Genetics. “DDBJ is the sole DNA data bank in Japan, which is officially certified to collect DNA sequences from researchers and to issue the internationally recognized accession number to data submitters. We collect data mainly from Japanese researchers, but of course accept data and issue the accession number to researchers in any other countries. Since we exchange the collected data with EMBL/EBI and GenBank/NCBI on a daily basis, the three data banks share virtually the same data at any given time.”
  • Human Genome Sequencing Center at Baylor College of Medicine
    Human Genome Sequencing, Dictyostelium sequencing, Drosophila sequencing, Mouse chromosome Y, cDNA sequencing.
  • IMGT: The International ImMunoGeneTics Database
    This database specializes in immunoglobulins (Ig), T-cell receptors (TcR) and major histocompatibility complex (MHC) molecules of vertebrate species. A tool, IMGT/DNAPLOT, allows Ig, TcR and MHC sequence analysis.
  • Protein Sequence Databases

  • UniProt: United Protein Databases
    A single database that combines the information of the major international databases, European Bioinformatics Institute (EBI), Cambridge, UK; Protein Information Resource (PIR) - Georgetown University Medical Center (GUMC) & National Biomedical Research Foundation (NBRF), Washington, D.C.; and Swiss Institute of Bioinformatics (SIB) - Geneva, Switzerland. “The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information.”
  • PIR Protein Sequence Database
    The database is described by its sponsor as “functionally annotated protein sequences, which grew out of the Atlas of Protein Sequence and Structure (1965-1978) edited by Margaret Dayhoff and has been incorporated into an integrated knowledge base system of value-added databases and analytical tools.” From the Protein Information Resource, the major U.S. source of protein informatics.
  • Swiss-Prot
    The major European protein sequence database, with accompanying annotations, from the Swiss Institute of Bioinformatics. “Swiss-Prot is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases.” Also at this site is TrEMBL, which contains all translated nucleic acid protein coding sequences in EMBL that have not yet been annotated and incorporated into Swiss-Prot.
  • Protein Classification Databases

  • iPROCLASS
    From the maintainers of the PIR Protein Sequence Database, it integrates the classification data from that resource.
  • PROSITE
    A database of protein families and domains. It is produced by and is closely linked to the ExPasy site, which hosts the Swiss-Prot sequence database.
  • MEROPS - the Protease Database
    “The database provides a catalogue and structure-based classification of peptidases (i.e. all proteolytic enzymes). This is a large group of proteins (nearly 2% of all gene products) that is of particular importance in medicine and biotechnology.”
  • The Center for Molecular Modeling (CMM)
    Many molecular modelling programs are available through the link “Research Tools on the Web,” including Molecules To Go , a World Wide Web (WWW) Forms interface which facilitates access (browsing, searching and retrieval) to the molecular structure data contained within the Brookhaven Protein Data Bank (PDB).
  • Protein Data Bank (PDB)
    “The single international repository for the processing and distribution of 3-D macromolecular structure data primarily determined experimentally by X-ray crystallography and NMR.”