Biological Databases
Life Sciences Databases for Bioinformatics
Technology and Computer Science Databases for BI
Genomics and Commercial Databanks
From the annual database issue of this important journal, updated each January. A very useful collection of links to a comprehensive list of over 300 resources. Includes summaries of each entry, with description, recent developments, authors responsible and a contact link
This page includes links to a number of model organism databases, banks and tables, and to a number of genetic databases. Maintained by the Dept. of Molecular & Cellular Biology, Harvard University.
Includes nucleotide databases, protein databases, chromosome maps, enzyme databases, etc.
From the Computational Molecular Biology at NIH Web site.
Nucleotide Sequence Databases
“GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences ( Nucleic Acids Research 1999 Jan 1;27(1):12-7). It is part of the International Nucleotide Sequence Database Collaboration , which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI.”
The EMBL Nucleotide Sequence Database is the European equivalent to the U.S.'s Gen Bank, found above and “constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications.”
The third in the trio of these major sequence databases, based in Japan's National Institute of Genetics. “DDBJ is the sole DNA data bank in Japan, which is officially certified to collect DNA sequences from researchers and to issue the internationally recognized accession number to data submitters. We collect data mainly from Japanese researchers, but of course accept data and issue the accession number to researchers in any other countries. Since we exchange the collected data with EMBL/EBI and GenBank/NCBI on a daily basis, the three data banks share virtually the same data at any given time.”
Human Genome Sequencing, Dictyostelium sequencing, Drosophila sequencing, Mouse chromosome Y, cDNA sequencing.
This database specializes in immunoglobulins (Ig), T-cell receptors (TcR) and major histocompatibility complex (MHC) molecules of vertebrate species. A tool, IMGT/DNAPLOT, allows Ig, TcR and MHC sequence analysis.
Protein Sequence Databases
A single database that combines the information of the major international databases, European Bioinformatics Institute (EBI), Cambridge, UK; Protein Information Resource (PIR) - Georgetown University Medical Center (GUMC) & National Biomedical Research Foundation (NBRF), Washington, D.C.; and Swiss Institute of Bioinformatics (SIB) - Geneva, Switzerland. “The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information.”
The database is described by its sponsor as “functionally annotated protein sequences, which grew out of the Atlas of Protein Sequence and Structure (1965-1978) edited by Margaret Dayhoff and has been incorporated into an integrated knowledge base system of value-added databases and analytical tools.” From the Protein Information Resource, the major U.S. source of protein informatics.
The major European protein sequence database, with accompanying annotations, from the Swiss Institute of Bioinformatics. “Swiss-Prot is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases.” Also at this site is TrEMBL, which contains all translated nucleic acid protein coding sequences in EMBL that have not yet been annotated and incorporated into Swiss-Prot.
Protein Classification Databases
From the maintainers of the PIR Protein Sequence Database, it integrates the classification data from that resource.
A database of protein families and domains. It is produced by and is closely linked to the ExPasy site, which hosts the Swiss-Prot sequence database.
“The database provides a catalogue and structure-based classification of peptidases (i.e. all proteolytic enzymes). This is a large group of proteins (nearly 2% of all gene products) that is of particular importance in medicine and biotechnology.”
Many molecular modelling programs are available through the link “Research Tools on the Web,” including Molecules To Go , a World Wide Web (WWW) Forms interface which facilitates access (browsing, searching and retrieval) to the molecular structure data contained within the Brookhaven Protein Data Bank (PDB).
“The single international repository for the processing and distribution of 3-D macromolecular structure data primarily determined experimentally by X-ray crystallography and NMR.”