SCRIPTING LANGUAGES
Perl
Bioperl is arguably the most successful project that is part of the O|B|F. Bioperl
code is an extensive library of core modules written in Perl to support the processing,
manipulating, and managing of biological information in the form of sequences.
One of the reasons that the Perl programming language was chosen was that it had
already gained popularity in the bioinformatics community for its support of text
processing and pattern matching task. The Bioperl project attempts to emulate the
object-oriented programming paradigm through the use of Perl modules and by
adhering to three design principles. The first principle is to separate the interface from
the implementation. The second principle is to provide a base framework for the
respective operation by generalizing common routines into a single module. The third
and final principle is to use the Factory and Strategy patterns as defined by Erich
Gamma. For more information, go to http://www.bioperl.org/.
Python
In general, high-level scripting languages are popular language choices for
researchers in the field of bioinformatics. In addition to its abilities as a scripting
language, Python has the added support of advanced numerical capabilities through
the Scientific Tools for Python (SciPy) project.
The Biopython project was created in 1999 and is modeled after the successful
Bioperl project. Most of the work of the Biopython project has focused on
creating parsers for biological data and designing a useful interface to represent
sequences. One of the unique features of the Biopython project is the use of a
standard event-oriented parser design. For more information, go to
http://biopython.org/.
PHP
Formerly known as GenePHP, the BioPHP project seeks to extend the PHP
language so that it can be used to develop bioinformatics applications. The main
purpose of the BioPHP project is to encourage the use of PHP as a “glue” language to
bind web-based bioinformatics applications and databases. Some of the tasks that
BioPHP can currently do is read biological data in the GenBank, Swissprot, Fasta, and
Clustal ALN formats and perform simple sequence analysis tasks. For more
information, go to http://genephp.sourceforge.net/.
Ruby
BioRuby is part of the O|B|F and is primarily supported by the Human Genome
Center at the University of Tokyo and the Bioinformatics Center at Kyoto University.
One of the advantages of using Ruby for bioinformatics is that is has native support
for object-oriented programming with a simple but powerful syntax.
In addition to the BioRuby project, several educators have advocated the
desirability of introducing undergraduate students to the Ruby programming language. For example, Daniel Lim introduced Ruby as a tool for bioinformatics in his programming languages class. Not only was it an in-class success but several
students continued the work as an independent project.
GENERAL PURPOSE LANGUAGES
C
The Laboratory of DNA Information Analysis at the University of Tokyo has
produced an open source C library of the most commonly used clustering algorithms. Clustering routines are used to analyze gene expression data. The library, as
written, is callable from any program written in C or C++. Extensions have also been
developed to allow these programs to be used in Perl and Python programs.
C++
Gianluca Della Vedova leads the Algorithms Library for Bioinformatics
(ALiBio) project at the University of Milan-Bicocca. The stated goal of the project is
to provide a library of fundamental, C++ implemented algorithms that will be used to
develop applications in the bioinformatics field. While most bioinformatics
programming projects have focused on making tools that are easy to use, the ALiBio
project values efficiency as its top priority. The goal is to provide tools to help
produce highly optimized applications. In addition to the stated goal of efficiency, all
libraries and algorithms that are included in ALiBio must have an extensive suite of
regression tests and must be clearly documented.
The ALiBio project began in March 2002. For more information, go to http://bioinformatics.org/ALiBio/.
Java
BioJava is an open source Java library and part of the O|B|F. The BioJava
project is primarily concerned with how to represent sequences. The major feature of
the BioJava project is that two unique representation schemes for sequences have been
defined in the Java programming language. The first scheme is the basic string-of-token representation and is used when annotation is unimportant in the analysis of the
data. The second representation is the annotated sequence framework and is used
when a fully annotated view of the sequence is required. More information about
BioJava may be found at http://biojava.org/.
Squeak
One of the more disappointing aspects of examining programming languages and
how they are being used in the biological sciences is to find a project that seems
promising but is ultimately a disappointment. Squeak is an open source
implementation of the object-oriented programming language Smalltalk. With its
built in graphics and truly object-oriented paradigm, Squeak would seem an ideal
language for the biological sciences. In fact, the bioSqueak homepage seems to make
such promises. However, there is no evidence that any work has actually been
completed on the bioSqueak project.
FUNCTIONAL AND LOGIC LANGUAGES
Haskell
Haskell is a general purpose, purely functional programming language. Haskell
has been used by Robert Giegerich's research group to implement dynamic
programming algorithms, including those involved in RNA folding grammars.
The group claims that their work adds a significant amount of flexibility and
versatility in the development of new dynamic programming algorithms.
Lisp
Formerly known as BioLingua, BioBike is an interactive, web-based
programming environment that enables biologists to analyze biological systems by
combining knowledge and data through direct end-user programming. The goal
of BioBike is to enable biologists to program directly by providing an environment
that is more natural to trained biologists. The main BioBike language is BioLisp.
BioLisp is simply common Lisp with added biological functionality.
Prolog
Written in SWI-Prolog, Biomedical Logic Programming (Blip) is a collection of
logic programming modules intended primarily for bioinformatics and biomedical
applications. Blip allows users to program in a declarative way and facilitates
both query-oriented and application-oriented programming. For more information, go
to http://bioprolog.org/.
XML – A MARKUP LANGUAGE
While not technically a programming language, no discussion of biological
oriented programming would be complete without a brief discussion of XML. When
dealing with massive amounts of data in different formats, there must be a way to
exchange this information from one application to another. XML is an extensible,
universal format for structured data exchange and documents on the web. Two of
the most notable attempts to use XML as a framework in biology have been the
Bioinformatics Sequence Markup Language (BSML) and the BIOpolymer Markup
Language (BioML).