BOSC 2000

An Introduction to Genome Annotation Markup Elements (GAME) XML

Suzanna Lewis

Berkeley Drosophila Genome Project suzi@fruitfly.berkeley.edu
539 Life Sciences Addition Tel: 510-643-0514
University of California Fax: 510-643-9947
Berkeley, CA 94720 http://fruitfly.berkeley.edu

This work is focused on defining the essential XML elements necessary to describe biological sequences (both nucleic and protein), results of analytical work carried out on those sequences (both computational and experimental), and to describe the conclusions that are drawn from these analyses. The resulting GAME XML is being used for large-scale annotation efforts being carried out at FlyBase and for EnsEMBL. GAME XML defines these elements at a very high level of abstraction in order to allow for the greatest flexibility and adaptability to a full range of possibilities. The requirements include the ability to describe all varieties of computational analysis, such as alignments (e.g. BLAST, sim4, clustalW, etc.), predictions (e.g. Genie, genscan, genefinder, etc.), motifs (e.g. pfam, BLOCKS, promoter regions, etc.), experimental results (e.g. mutational events, location of chromosomal breakpoints, etc.), and any other feature that can be reduced to an interval (or set of intervals) along the length of a molecular sequence. GAME makes an absolute distinction between the results, which serve as primary data, and the conclusions that are drawn from the results, which are termed 'annotations'. This precise definition of an annotation in GAME demands that every annotation be justified and supported by the results that come from the analysis of the sequence. We will describe the overall structure and explain the logic behind each design decisions that we made. GAME differs from GFF in both the problem it solves and in the richness of the semantics. In practical applications GAME has proven to be successfully flexible to support complex annotations of entire genomes.