Biopython 1.58 released

Source distributions and Windows installers for Biopython 1.58 are available from the downloads page on the Biopython website and from the Python Package Index (PyPI). A new interface and parsers for the PAML (Phylogenetic Analysis by Maximum Likelihood) package of programs, supporting codeml, baseml and yn00 as well as a Python re-implementation of chi2 was added as the Bio.Phylo.PAML module. Bio.SeqIO now includes read and write support for the SeqXML, a simple XML format offering basic annotation support. [Read More]

OBF and Google Summer of Code 2011

Great news: Google announced today that the Open Bioinformatics Foundation (OBF) has been accepted as a mentoring organization for this summer’s Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2011 FAQ. [Read More]

Biopython dropping Python 2.4 Support?

This is a reminder that the forthcoming Biopython 1.56 release is planned to be our last release to support Python 2.4. Looking back, we supported Python 2.3 for about six years - it was released July 2003, and Biopython 1.50 released in April 2009 was the last to support it. Similarly, Python 2.4 was released six years ago (November 2004). Dropping Python 2.4 support will allow use to assume standard library modules like the ElementTree XML parser and SQLite 3 support will be available. [Read More]

BioPerl has moved to GitHub

BioPerl has migrated to git and GitHub! We have also set up a mirror set of several key repositories at the great public git hosting site repo.or.cz. If you are a current BioPerl developer (had a previous account for direct access to our prior Subversion repository), please sign up for a GitHub account and let us know your user ID. Also, add the extra email (where ‘DEVNAME’ is your original Subversion account ID). [Read More]

Illumina FASTQ files - Read Segment Quality Control Indicator

In another quirk to the FASTQ story, recent Illumina FASTQ files don’t actually use the full range of PHRED scores - and a score of 2 has a special meaning, The Read Segment Quality Control Indicator (RSQCI, encoded as ‘B’). Hats off to Dr Torsten Seemann for raising awareness of this issue in his post on the seqanswers.com forum, referring to a presentation by Tobias Mann of Illumina which says: [Read More]

Partial sequence files with Biopython

This is another blog post to highlight one of the neat tricks you’ll be able to do with Biopython 1.54 (which you can help test with the Biopython 1.54 beta release). It is often useful to be able to extract a few records from a larger sequence file - for example, some sequences of interest from a full UniProt or GenBank dump. One obvious way to try to do this is to parse the file into an object representation (i. [Read More]

Making Biopython SeqIO and AlignIO easier

One of the small changes coming in Biopython 1.54 (which you can try out already using the Biopython 1.54 beta) is to Bio.SeqIO and Bio.AlignIO. Previously the input and output functions had required file handles, but they will now also accept filenames. This is a case of practicality beats purity (to quote the Zen of Python), and is particularly handy when doing very short scripts or working at the Python prompt. [Read More]

Sanger FASTQ format and the Solexa/Illumina variants

I’m delighted to announce an open access publication in Nucleic Acids Research describing the FASTQ file format based on the conventions agreed by the OBF projects: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Peter J. A. Cock ( Biopython), Christopher J. Fields ( BioPerl), Naohisa Goto ( BioRuby), Michael L. Heuer ( BioJava) and Peter M. Rice ( EMBOSS). Nucleic Acids Research, doi:10. [Read More]

Working with FASTQ files in Biopython when speed matters

Biopython 1.51 onward includes support for Sanger, Solexa and Illumina 1.3+ FASTQ files in Bio.SeqIO, which allows a lot of neat tricks very concisely. For example, the tutorial ( PDF) has examples finding and removing primer or adaptor sequences. However, because the Bio.SeqIO interface revolves around SeqRecord objects there is often a speed penalty. For example for FASTQ files, the quality string gets turned into a list of integers on parsing, and then re-encoded back to ASCII on writing. [Read More]