-

Difference between revisions of "Google Summer of Code"

From Open Bioinformatics Foundation
Jump to: navigation, search
(Open-Bio projects involved)
(GSoC 2016: Link to new site)
 
(248 intermediate revisions by 29 users not shown)
Line 1: Line 1:
The O|B|F is applying for the first time for the [http://socghop.appspot.com/ Google Summer of Code] (GSoC) program as an umbrella organization for all O|B|F-affiliated projects.
+
[[Image:GSoC15-logo-small.jpg|right|frame|link=http://code.google.com/soc]]  
  
On this page we are collecting ideas, possible projects, prerequisites, possible solution approaches, mentors, other people or channels to contact for more information or to bounce ideas off of, etc.
+
Google Summer of Code (GSoC) is a student internship program for open-source projects. The program offers eligible student developers stipends to write code for open source projects over a period of 3 summer months ("flip bits, not burgers"). See the '''[http://code.google.com/soc Google Summer of Code Main Site]''' for general information about the Google Summer of Code program, how to apply, frequently asked general questions, and more.
  
== About Google Summer of Code ==
 
  
[[Image:GSoC2009Logo.png|352px|right]]
+
== GSoC 2016 ==
Google Summer of Code (GSoC) is maybe best described as a remote student internship program for globally distributed, collaboratively developed open-source projects. The program offers eligible student developers stipends to write code for open source projects over a period of 3 summer months ("flip bits, not burgers"). Aside from the stipend, perhaps the most important qualitative difference of this program is that students are paired with mentors, who are typically experienced developers from the project to which the student would be contributing, and who can guide the student to interact productively with the community, prevent getting stuck in obstacles, and avoid chasing down the wrong direction. The program is global - students and mentors may be located anywhere where they have internet connection ([http://socghop.appspot.com/document/show/program/google/gsoc2009/faqs#not_eligible except for countries affected by US trade restrictions]), and no travel is required. Thus, other than the stipend and the mentorship, the internship mirrors normal contributors to such distributed development projects, which is a useful learning experience in itself, as the skills needed to be effective at this are typically not taught in computer science curricula, yet are highly desired in an increasingly global IT industry.
 
  
From the viewpoint of participating open-source projects, the program not only offers to pay students for contributing, but more importantly offers an opportunity to recruit new developers in a way that allows far more people to leap over the barrier from interested user to code contributor.
+
The Google Summer of Code 2016 is ON! OBF is once again applying as a GSoC mentoring organization this year. Interested mentors and students should subscribe to the OBF/GSoC [http://lists.open-bio.org/mailman/listinfo/gsoc mailing list]. Please announce yourself, so we know who you are!
 +
The details of each of our project ideas are listed below, including potential mentors.
  
The program was started in 2005 and has since been run annually out of Google's Open Source Program Office (OSPO). Google sponsors the program not only by providing administrative staff support and an overall framework, but also with a very significant amount of money (paid-out stipends amounted to more than $5M alone in 2009).  [http://socghop.appspot.com/document/show/program/google/gsoc2009/faqs#about_gsoc Quoting from the GSoC documentation], "the program has historically brought together over 2,400 students with over 230 open source projects. The majority of past student participants were enrolled in university or college Computer Science and Computer Engineering programs, but overall GSoCers come from a wide variety of educational backgrounds, from computational biology to mining engineering. Many past student participants had never participated in an open-source project before GSoC; others used the GSoC stipend as an opportunity to concentrate fully on their existing open source coding activities over the summer. Many GSoC 'graduates' have later become program mentors."
+
See http://obf.github.io/GSoC/ for more information about the GSoC program and additional ways to get in touch with us.
  
See also below for [[#Reference_Facts_.26_Links:_Google_Summer_of_Code_2009|reference facts such as eligibility and timelines]].
+
<!--Our GSoC ideas from each project are collected here: '''[[Google Summer of Code 2015 Ideas |OBF Project Ideas for GSoC 2015]]''' -->
 +
=== Facts & Links ===
  
== News ==
+
; Time Line :
 +
:* [https://developers.google.com/open-source/gsoc/timeline GSoC time line]
  
* 13 Mar 2009: [http://docs.google.com/Doc?id=dhs98hzv_7zn8bxqjm Application to participate as a mentoring organization] submitted. ''--[[User:Lapp|Lapp]]''
+
; GSoC 2016 FAQ :
*  08 Mar 2009: The project ideas page (the page you are looking at) is ready for adding project ideas. ''--[[User:Lapp|Lapp]]''
+
:* For questions of eligibility, see the [https://developers.google.com/open-source/gsoc/faq GSoC 2016 FAQ].
  
== Contact ==
+
; Info from Google :
 +
:* There is also a [http://groups.google.com/group/google-summer-of-code-discuss Google group for posting GSoC questions] (and receiving answers; note that you will need to sign up for the group) that relate to the program itself (and are not specific to our organization).
 +
:* Students receive a stipend from Google if accepted. See the  [https://developers.google.com/open-source/gsoc/faq GSoC 2016 FAQ] for full documentation.
 +
:* Development is done entirely remotely and on-line; there is no requirement or expectation for either students or mentors to travel.
  
Our organization administrators are [[User:Lapp|Hilmar Lapp]] ([mailto:hlapp%40gmx%2enet hlapp&#64;gmx&#46;net]) and [[User:Mauricio|Mauricio Herrera Cuadra]] ([mailto:mauricio%40open-bio%2eorg mauricio&#64;open-bio&#46;org]).
+
=== Why apply? ===
  
If you are a student interested in applying for a Google Summer of Code project with our organization, please send any questions you have, projects you would like to propose, etc to the developer mailing list of the pertinent O|B|F project.  
+
One of the most important features of the program is that students are paired with mentors, who are typically experienced developers from the project to which the student is contributing.  The mentor guides the student to work productively within the community, and helps the student avoid obstacles and pitfalls.  The program is global - students and mentors may be located anywhere where they have an internet connection (except for countries affected by US trade restrictions), and no travel is required. Thus, aside from the stipend and mentorship aspects, the student's experience in the internship closely mirrors normal work on distributed development projects.  Effective work habits for distributed development are typically not taught as part of computer science curricula, yet are highly desired in the increasingly global and distributed software, IT, and biotechnology industries.
  
How do you know which project is pertinent and the address of its developer mailing list? The [[#Open-Bio_projects_involved|projects under the O|B|F umbrella are listed below]], with home page and developer mailing lists. Each project idea lists the O|B|F project it is a part of; look it up in the list below and you have the information you need. If you want to propose your own project idea and the project to which you would contribute isn't obvious, send email to [mailto:gsoc%40lists%2eopen-bio%2eorg gsoc&#64;lists&#46;open-bio&#46;org].
+
From the viewpoint of each open-source project, the program not only offers to pay students for contributing, but more importantly, offers an opportunity to recruit new developers who will hopefully go on to become regular, sustaining contributors.
  
Some of us also hang out regularly on IRC, see the list of O|B|F projects below for information on which projects have a channel and the name of the channel. ''(If you do not have an IRC client installed, you might find the [http://en.wikipedia.org/wiki/List_of_IRC_clients comparison on Wikipedia], the [http://directory.google.com/Top/Computers/Software/Internet/Clients/Chat/IRC/ Google directory], or the [http://www.ircreviews.org/clients/ IRC Reviews] helpful. For Macs, [http://en.wikipedia.org/wiki/X-Chat X-Chat Aqua] works pretty well. If you have never used IRC, try the [http://irchelp.org/irchelp/ircprimer.html IRC Primer] at [http://irchelp.org/ IRC Help], which also has links to lots of other material.)''
+
==Project Ideas==
  
For applying, please make sure you read our [[#What_should_prospective_students_know.3F|documentation on information that students should know and guidelines we expect you to follow]] ''before'' you apply. We don't have a format template for application that you need to adhere to, but we do ask that you include specific kinds of information. What those are is documented under "[[#When you apply|When you apply]]."
 
  
== Ideas ==
+
Our GSoC ideas from each project are collected here: http://obf.github.io/GSoC/
  
''Note: if there is more than one mentor for a project, the primary mentor is in '''bold font'''. Biographical and other information on the mentors is linked to in the [[#Mentors|Mentors section]].''
+
== OBF Projects Accepting Applicants ==
  
''Students: The below are only our project '''ideas''', albeit well thought-out ones. You are welcome to propose your own project if none of those below catches your interest, or if your idea is more exciting to you, provided it is still a contribution to one the O|B|F member projects (see list below). Just be aware that we can't guarantee finding an appropriate mentor, but if we like your proposal we will try. Regardless of what you decide to do, make sure you read and follow the [[#What_should_prospective_students_know.3F|guidelines for students]] below.''
+
[[Image:BioPerl_logo_tiny.jpg|right|link=bp:Google Summer of Code]]
 +
; [[bp:Google Summer of Code|BioPerl]] :
 +
:* '''[[bp:Google Summer of Code | BioPerl GSoC Page]]''' - project ideas and mentors
 +
:* [[bp:Main Page|Project website]]
 +
:* [[bp:Becoming_a_developer|Information for new developers]]
 +
:* source code browser for [http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk bioperl-live] (the main BioPerl code base), and [http://code.open-bio.org/svnweb/index.cgi/bioperl/ all BioPerl sub-projects]
 +
:* [[bp:Project_priority_list|Priority list]] of things that need work, as another source for student-conceived project ideas
 +
:* [[bp:Mailing_lists|Mailing lists]]
 +
:* IRC: <code>#bioperl</code> on [http://freenode.net Freenode]
  
<!--
+
[[Image:Biopython_logo_tiny.png|right|link=biopython:Google Summer of Code]]
=== Write a NEXUS parser in C&amp; ===
+
; [[biopython:Google Summer of Code |BioPython]] :
''This is a template for how the student project ideas could be presented. Feel free to copy & paste & edit, and feel free to adjust the format.''
+
:* '''[[biopython:Google Summer of Code | BioPython GSoC Page]]''' - project ideas and mentors
; Rationale : C& is an amp'ed-up programming language that has not been invented yet but in a few years will dominate the programming world. The best way to prevent broken non-compliant NEXUS parsers written in C& from appearing is to write a good one now.
+
:* [[biopython:Main Page|Project website]]
; Approach : Re-implementations of NEXUS parsers inevitably tend to be broken or non-compliant. Hence, the best approach is to write a translator that translates a reference implementation to C&.
+
:* [[biopython:Contributing|Information for contributors]]
; Challenges : C& has not been invented yet, so a lot of assumptions will have to be made.
+
:* [[biopython:Mailing lists|Mailing lists]]
; Involved toolkits or projects : The [http://www.biocamp.org BioC&] toolkit has much of the needed framework.
+
:* [[biopython:SourceCode| Source Code]]
; Degree of difficulty and needed skills : Hard. The hardest part is probably inventing C&amp;. Writing the parser itself should be medium, unless C& was ill-designed for writing parsers. Knowledge of the BioC& toolkit will obviously help, as well as knowing the NEXUS format.
+
:* No IRC channel at present
; Mentors : Mike&amp;, founder of BioC&
 
-->
 
  
=== Write a JEE5 webservice interface to BioSQL ===
+
[[Image:Biojava_logo_tiny.jpg|right|link=http://biojava.org/wiki/Google_Summer_of_Code]]
; Rationale :  
+
; [http://biojava.org/wiki/Google_Summer_of_Code BioJava] :
BioSQL is a intelligently designed database schema for storing sequence data and associated metadata. It does however lack any kind of user API. A sensible way to design an API for a BioSQL backed database would be to expose the API as webservices. This would allow the API to be language and database agnostic (unlike an API based on database proceedures). It would also allow data in BioSQL to be very loosely coupled into bioinformatics workflows. Once an API is in place one could even adopt modified SQL schemas underneath as long as the data access API still conforms to some specification.  
+
:* '''[http://biojava.org/wiki/Google_Summer_of_Code BioJava GSoC Page]''' - project ideas and mentors
 +
:* [http://biojava.org/wiki/BioJava:Modules BioJava modules] as another source for student-conceived project ideas
 +
:* source code for [http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk biojava-live] (the main BioJava code base) and [http://code.open-bio.org/svnweb/index.cgi/biojava/ all BioJava sub-projects]
 +
:* [http://biojava.org/wiki/BioJava:MailingLists Mailing lists]
 +
:* No IRC channel at present
  
; Approach :  
+
[[Image:BioRuby_logo_tiny.png|right|link=http://bioruby.org]]
Since the development of Java EE5 (and EJB3) the development of Enterprise Java Beans that interoperate with databases and webservices is exceptionally easy. In addition Java Session Beans can be readily exported as webservices with the addition of simple annotations, often no specific configuration is required. Free and open Java app servers (such as glassfish) that provide almost all of the management middleware for object relational mapping (ORM) and webservice deployment (and a whole host of other things) are available and relatively simple to use. Finally the free and open IDE Netbeans has excellent integration with Glassfish and Java EE5 (plus I am most experienced with this IDE so I can provide more help with it's use). For these reasons I would suggest that Java EE5 is the most sensible approach to implementing this project.
+
; [http://bioruby.org BioRuby] :
 +
:* '''[http://bioruby.open-bio.org/wiki/Google_Summer_of_Code BioRuby GSoC Page]''' - project ideas and mentors
 +
:* [http://bioruby.org Project website]
 +
:* [http://lists.open-bio.org/mailman/listinfo/bioruby developers mailing list]
 +
:* [http://github.com/bioruby/bioruby/tree/master source code]
 +
:* IRC: <code>#bioruby</code> on [http://freenode.net Freenode]
  
During a development meeting, in Tokyo in 2008, a preliminary EJB mapping to BioSQL was generated. What remains to be done is the development of a simple, well documented and well tested API specification and implementation that bioinformatics developers can use to perform CRUD (CReate, Update, Delete) functions on the database as well as useful search and retreival operations.
+
[[Image:BioSQL_logo.png|160px|right|link=biosql:Main Page]]
 +
; [[biosql:Main Page|BioSQL]] :
 +
:* [[biosql:Main Page|Project website]]
 +
:* Current [http://biosql.org/wiki/Enhancement_Requests enhancement requests] as another source for student-conceived project ideas
 +
:* [http://biosql.org/mailman/listinfo/biosql-l developers mailing list]
 +
:* [http://code.open-bio.org/svnweb/index.cgi/biosql/browse/biosql-schema/trunk source code]
 +
:* No IRC channel at present
  
In summary the project will define and document an API and expected behaivour and then implement the webservice interface. A set of unit tests will also be developed along with a proof-of-concept app that demonstrates use of the API.
+
; [http://biohaskell.org/ BioHaskell] :
 +
:* [http://biohaskell.org/ Project website]]
 +
:* [http://hackage.haskell.org/packages/#cat:Bioinformatics Bioinformatics section on HackageDB]
  
; Challenges :  
+
; [http://biocaml.org Biocaml] :
* Designing and documenting the API so that it is simple and intuitive
+
:* [http://biocaml.org Project website]
* Making simple queries simple and efficient and complex queries possible.
+
:* [https://groups.google.com/d/forum/biocaml Mailing list]
* Making CRUD operations secure (only people with the right credentials should be able to delete the data).
+
:* [http://www.open-bio.org/wiki/Google_Summer_of_Code_2015_Ideas#Biocaml Project ideas]
* Loaders for common file types.
 
* [Nice to have] Making a test application that will call API methods with predefined arguments. This will let people make alternative implementations of the API while testing they are still compatible with the API. For example someone could make an entire implementation in Perl/ BioPerl and still have it validate against the API.
 
  
; Involved toolkits or projects :
+
== Guide for prospective GSoC students ==
JavaEE5, BioSQL, parts of BioJava would be useful to steal for parsing.
 
  
; Degree of difficulty and needed skills :
+
=== Before you apply ===
Medium to Hard. While the use of Java EE5 is now quite easy (esp with IDEs like Netbeans) there is quite a lot of concepts involved in the project (Webservices, ORM, EJBs etc). The hard part would be getting up to speed with those concepts. If you already know a lot of this then the project would only be medium difficulty. At minimum the student should be confident with Java and at least aware of some of the technologies. This is not the right project for a very new programmer.
 
  
; Mentors : [http://www.linkedin.com/in/markjschreiber Mark Schreiber] (and anyone else who wants to help)
+
* Proposals should extend one of affiliated toolkits, not start a new project.
 +
* If you want to apply with your own idea, it's best to [[#Contact|contact]] the OBF subproject you're interested in well before the application deadline, so we can work with you to find a mentor and solidify your project idea and application.
 +
* [[#Contact|Ask us questions]] on the subproject mailing lists about the project idea you have in mind.
 +
* Write a project proposal draft, include a project plan (see below), and [[#Contact|send it to a project mailing list]] for comments before submitting it.
  
===  Mapping the NCBI toolkit to BioPerl, BioRuby, BioConductor and BioJAVA using BioLib ===
+
Again, '''students are strongly encouraged to [[#Contact|contact us]] as early as possible'''. Frequent and early communication is extremely valuable for putting together successful projects.
  
; Rationale :
+
=== When you apply ===
  
The National Center for Biotechnology Information (NCBI) has created a
+
When applying, (aside from the information requested by Google) please provide the following in your application material.
large collection of utilities developed for the production and
+
# '''Your complete contact information''', including full name, physical address, preferred email address, and telephone number, plus other pertinent contact information such as IRC handles, etc.
distribution of GenBank, Entrez, BLAST, and related services. To
+
# Why you are interested in the project you are proposing and are well-suited to undertake it.
support these utilities a large set of C and C++ libraries are
+
# A summary of your programming experience and skills.
maintained and regularly improved by NCBI. These include, for example,
+
# Programs or projects you have previously authored or contributed to, in particular those available as open-source, including, if applicable, any past Summer of Code involvement.
sequence alignment algorithms, antigenic determinant prediction,
+
# A project plan for the project you are proposing, even if your proposed project is directly based on one of the proposed project ideas for member projects.
CPG-island finder, ORF finder and string matchers. This functionality
+
#* A project plan in principle divides up the whole project into a series of manageable milestones and time-lines that, when all accomplished, logically lead to the end goal(s) of the project. Put in another way, a project plan explains what you expect you will need to be doing, and what you expect you need to have accomplished, at which time, so that at the end you reach the goals of the project.
is ultimately of great interest to all scientists working in molecular
+
#* Do not take this part lightly. A compelling plan takes a significant amount of work. Empirically, applications with no or a hastily composed project plan have not been competitive, and a more thorough project plan can easily make an applicant out compete another with more advanced skills.
biology with application in biology and biomedical research.
+
#* A good plan will require you to thoroughly think about the project itself and how one might want to go about the work.
 +
#* We don't expect you to have all the experience, background, and knowledge to come up with the final, real work plan on your own at the time you apply. We do expect your plan to demonstrate, however, that you have made the effort and thoroughly dissected the goals into tasks and successive accomplishments that make sense.
 +
#* We strongly recommend that you bounce your proposed project and your project plan draft off of us, using either the pertinent developers mailing list or the IRC channel(s). Through the project plan exercise you will inevitably discover that you are missing a lot of the pieces - we are there to help you fill those in as best as we can.
 +
# Any obligations, vacations, or plans for the summer that may require scheduling during the GSoC work period.
 +
#* We expect the your GSoC project to be your primary focus over the summer.  It should not be regarded as a part-time occupation.
 +
#* If you feel that you can manage other work obligations concurrently with your Summer of Code project, make your case and support it with evidence.
 +
#* Be honest and open.  If it turns out later that you weren't clear about other obligations, at best (i.e., if your accomplishment record at that point is spotless) our trust in you will be severely degraded. Also, if you are accepted, discuss with your GSoC mentor before taking on additional obligations.
 +
#* One of the most common reasons for students to struggle or fail is being overcommitted.  Do not set yourself up for failure!  GSoC summers should be fun and rewarding!
  
Unfortunately, few bioinformaticians work with C/C++. Addressing this
+
== Student Progress Reports ==
NCBI has made a binding available for Python. This is not enough as
 
bioinformaticians work in many different programming languages, and to
 
be fully effective support should be made available at least for Perl,
 
R and JAVA. These three together, probably, representing over 90% of
 
bioinformaticians. The [http://biolib.open-bio.org/ BioLib project]
 
successfully provides the 'mapping' infrastructure to map complex
 
libraries against many computer languages using SWIG. Basically one
 
mapping suffices to support all popular languages.
 
 
; Approach :
 
  
Special interfaces need to be developed to map the NCBI toolkit
+
In addition to writing code, accepted students send weekly updates to the OBF community on their project's progress. These updates allow us to keep aware of how GSoC students are doing, give students a forum to ask any questions, and promote overall community bonding.
libraries against Perl initially. The (outdated) NCBI
 
[http://pypi.python.org/pypi/ncbi/0308 Python mapping] can be used as an initial
 
guide for mapping functionality. Once mapped against Perl mapping
 
against Ruby and Python is trivial. However, at this point BioLib
 
support for R and JAVA needs to be developed. A proof-of-concept can
 
be part of this project. Finally SWIG mappings can be used to
 
create automated documentation and testing of BioLib code.
 
  
; Challenges :
+
At the beginning of the summer, we ask that you set up a blog for the GSoC project (or a category/tag on your existing blog) which you will use to summarize your progress every week, as well as longer posts about your work if you'd like. (See [http://zruanweb.com/tag/gsoc.html these] [http://www.yeyanbo.com/tag/gsoc.html examples] from 2013.)
  
The main challenge is to provide nice and consistent interfaces in
+
Then, at the start of each week:
high-level languages against the NCBI C/C++ toolkit library. This
 
requires OOP design and unit testing of existing functionality.
 
Also some SWIG hacking may be involved to provide decent mappings for R and
 
JAVA, as well as SWIG auto generated documentation and testing.
 
  
; Involved toolkits or projects :
+
# Post an update on your blog: What did you do last week? What do you plan to do this week? Do you have any unanswered questions, any unsolved problems from the last week, interesting observations or anything else you'd like to mention?
 +
# Email the URL and text of the post (or a short summary) to the host project's mailing list (your mentors will confirm which one to use) ''and'' the main OBF GSoC mailing list (gsoc@lists.open-bio.org).
  
[http://biolib.open-bio.org/ BioLib], BioPerl, SWIG (and optionally BioRuby, R/Bioconductor, BioJAVA
+
You will be writing under your own name, but with a clear association with your mentors, the OBF and its projects, so please take this seriously and be professional. Remember that your blog will be one of the first things found by anyone interested in the project you're working on, and can be a valuable resource to them &mdash; as well as a significant part of your online presence.
or BioPython)
 
  
; Degree of difficulty and needed skills :
+
== Contact ==
  
This is a challenging project as it crosses
+
Before applying, please read our [[#What_should_prospective_students_know.3F|documentation on information that students should know and guidelines we expect you to follow]]. We also require that you include certain information, listed below, under "[[#When you apply|When you apply]]."
computer languages. It requires experience in C++ and a wish for
 
deeper understanding of at least one high-level OOP language
 
like Perl (did I write OOP?), Python, JAVA, R or Ruby.
 
  
; Mentors :
+
=== Staff and org Admins ===
 +
;Organization administrator: Eric Talevich ([mailto:eric&#46;talevich&#64;gmail&#46;com eric&#46;talevich&#64;gmail&#46;com])
 +
;Backup administrator: Raoul Bonnal ([mailto:ilpuccio&#46;febo&#64;gmail&#46;com email]) (IRC: helius | channels: #obf-soc, #bioruby, #gsoc ) (Skype: ilpuccio)
 +
<!--
 +
Other organisations relevant for bioinformatics students are: [http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013 Nescent], [https://github.com/SciRuby/sciruby/wiki/Google-Summer-of-Code-2013-Ideas SciRuby], [http://gmod.org/wiki/GSoC GMOD], who took on some of our projects and mentors.
  
'''Pjotr Prins''', Chris Fields
+
;2013 Organization administrator: [[User:PjotrPrins|Pjotr Prins]] ([mailto:pjotr.gsoc2013@thebird.nl pjotr.gsoc2013@thebird.nl])
 
+
;Backup administrators: Chris Fields, [[User:Lapp|Hilmar Lapp]], Robert Buels
=== BioSQL web interface and API on Google App Engine ===
+
-->
 
 
; Rationale :
 
The [http://www.biosql.org/wiki/Main_Page BioSQL] project provides a
 
robust and well supported database schema for storing sequence data and
 
associated annotations and features. It does not have a standard web interface
 
or web facing API, both of which would provide improved access to scientific
 
data. Deployment of BioSQL currently requires knowledge
 
and administration of relational databases, which can hinder its use in
 
smaller research laboratories that do not have public servers or experienced
 
systems administrators.
 
 
 
This proposal seeks to bridge this gap by providing a rapidly deployable
 
[http://en.wikipedia.org/wiki/Cloud_computing cloud based] solution utilizing
 
the established BioSQL backend. This system will allow scientists to share
 
results in a standard format both early on during research and at the time of
 
publication. By deploying on stable architectures, long term data access is
 
ensured and not dependent on maintenance of local servers. Data archival for
 
replication and expansion of ideas is an important part of the scientific process; this
 
[http://www.portfolio.com/views/blogs/market-movers/2009/02/18/when-academic-papers-arent-replicable?tid=true recent blog review]
 
summarizes some of the problems associated with primary data access.
 
 
 
; Approach :
 
[http://code.google.com/appengine/ Google App Engine] provides a full
 
development stack for rapidly building and deploying web applications. The
 
platform provides free quotas which allow a small lab with a limited budget to
 
make their data available, and also scales for larger projects with popular
 
data sets.
 
 
 
The student project expands an initial demonstration server
 
([http://biosqlweb.appspot.com/ demo server];
 
[http://github.com/chapmanb/biosqlweb/tree/master source code];
 
[http://bcbio.wordpress.com/2009/03/15/biosql-on-google-app-engine/ blog post])
 
to a full featured web application. The server side
 
implementation will be programmed in Python, utilizing the Google App Engine
 
[http://code.google.com/appengine/docs/ developers toolkit]
 
supplemented with the [http://biopython.org/wiki/Main_Page Biopython]
 
libraries. The client web interface will be designed using HTML, CSS and
 
javascript; the interface will utilize a full featured javascript
 
library, such as  [http://jquery.com/ jQuery] and [http://jqueryui.com/ jQueryUI]
 
or [http://extjs.com/ ExtJS]. Client to server communication occurs
 
using [http://en.wikipedia.org/wiki/Ajax_(programming) AJAX] techniques
 
with [http://en.wikipedia.org/wiki/JSON JSON] for data exchange.
 
 
 
In addition to the web interface, the server will also provide a programming
 
interface using a [http://en.wikipedia.org/wiki/Representational_State_Transfer REST]
 
API. This involves coordination with other proposed projects,
 
including the proposed JEE5 Java webservice, to design a common interface.
 
 
 
; Challenges :
 
* Familiarizing student with Python, Javascript and AJAX, as well as the Google App Engine environment.
 
* Initial implementation of BioSQL server interface with useful features.
 
* Coordinating input from users on the [http://lists.open-bio.org/mailman/listinfo/biosql-l BioSQL mailing list]. The student will need to solicit desired features from users and prioritize based on implementation time and importance. See [http://lists.open-bio.org/pipermail/biosql-l/2009-January/001464.html this mailing list discussion] for an example of interest and initial ideas.
 
* Designing the web interface for intuitive use.
 
* Coordinating API development with other projects.
 
 
 
; Involved toolkits or projects :
 
* [http://www.biosql.org/wiki/Main_Page BioSQL]
 
* [http://biopython.org/wiki/Main_Page Biopython]
 
* [http://code.google.com/appengine/ Google App Engine]
 
* [http://www.python.org Python]
 
* [http://jquery.com/ jQuery]; [http://jqueryui.com/ jQueryUI]
 
* [http://extjs.com/ ExtJS]
 
 
 
; Degree of difficulty and needed skills :
 
Medium to Hard. This requires a familiarity with current web frameworks and
 
utilizes a number of existing libraries to allow the student to jump right
 
into the development process. This requires the interested student to be comfortable
 
with quickly learning outside libraries. Beyond programming, the project
 
will also involve creative thinking about interface and usability design.
 
 
 
; Mentors :
 
Brad Chapman (plus...)
 
 
 
=== Biogeographical and community phylogenetics for BioPython ===
 
 
 
(Note: this project is proposed by potential GSoC student [[User:Nmatzke|Nick Matzke]].)
 
 
 
; Rationale : The field of phylogenetics has proliferated, and one new development is that large, phylogenetically explicit datasets are beginning to be used to answer questions about the relationships of ecological communities and biogeographic regions, instead of just individual clades.  The [http://www.phylodiversity.net/phylocom/ phylocom] package (Webb et al., 2008) contains fast C implementations of basic analyses such as alpha- and beta-phylodiversity (Net Related Index and Nearest Taxon Index).  The R package [http://picante.r-forge.r-project.org/ picante], funded by NESCent and Google Summer of Code 2008, contains utilities for processing phylocom inputs/outputs as well as additional tools for applied phylogenetics such as phylogenetic signal, phylosor (phylogenetic sorenon's index), and lineages-through-time plots.  These tools, developed for evolutionary community ecology, are useable in any context where a collection of lineages are undergoing cladogenesis, dispersal, and extinction in a series of containers (communities, biogeographic regions, gene families undergoing gene conversion, laterally transferring elements in unicell genomes, etc.) <br/>The related field of phylogenetic or historical biogeography -- the estimation of the geographic location of ancestral lineages, the history of their dispersal, and the history of connectivity and vicariance between regions -- has also advanced with a variety of algorithms (Ronquist's Dispersal-Vicariance Analysis, [http://www.ebc.uu.se/systzoo/research/diva/diva.html DIVA]; [http://code.google.com/p/lagrange/ lagrange], a maximum likelihood method implemented in Python, [http://code.google.com/p/lagrange/ available online at Google Code]; [https://www.nescent.org/wg_EvoViz/GeoPhyloBuilder GeoPhyloBuilder], a NESCent-sponsored package for producing GIS files to display biogeographic history in Google Earth; [http://panbiog.infobio.net/croizat/ croizat], a panbiogeographical method and visualization package implemented in python using matplotlib's Basemap module; and older methods derived from traditional ancestral-state reconstruction).
 
 
 
; Approach : Write BioPython modules/functions to:
 
: ("*" indicates some version of this already done independently by [[User:Nmatzke|Nick Matzke]])
 
:* Improve BioPython's [[Bio.Nexus.Trees]] [[newick]] parser, which currently cannot successfully read the newick files output by Phylocom (although these files are read successfully by a variety of other programs and modules, e.g. [http://www-ab.informatik.uni-tuebingen.de/software/dendroscope/welcome.html Dendroscope], [http://pbil.univ-lyon1.fr/software/alfacinha/ alfacinha] python module).*
 
:* Implement Cardona et al.'s [http://www.biomedcentral.com/1471-2105/9/532 Extended Newick format] for reticulating trees etc. (only exists in BioPerl currently)
 
:* Develop a series of functions for processing phylocom inputs and outputs*
 
:* Provide functions for basic community/geographic relatedness (e.g., NRI, NTI, phylosor)*
 
:* Calculating these statistics for large phylogenies requires calculating/processing a large distance matrix with a C or java library*
 
:* Basic graphics for analyzing community/regional phylogenetic history, e.g. lineage-through-time plots*
 
:* Downloading sample location data from online databases (e.g. [http://www.gbif.org/ GBIF], although see [http://iphylo.blogspot.com/search?q=latitude here]), combine with phylogenies for input into lagrange, DIVA or other algorithms
 
:* Re-creating DIVA in Python; the only available version is 12 years old and currently will only run on certain PCs
 
:* Process output from DIVA, lagrange, etc., for display in GISs, Google Earth (KML files), and/or matplotlib's Basemap
 
 
 
; Challenges :
 
:* Contacting & involving/getting feedback from authors of the mentioned packages (have been in contact with many of them already)
 
:* Uncertainty, error, & missing data in geographic location databases (see [http://iphylo.blogspot.com/search?q=latitude here]), and flagging such
 
:* Deciding the appropriate number of BioPython modules, etc. will require mentor advice
 
 
 
; Involved toolkits or projects :
 
:* [http://biopython.org/wiki/Main_Page Biopython]
 
:* [http://biosql.org/ BioSQL]
 
:* [http://www.python.org Python]
 
:* others mentioned above
 
 
 
; Degree of difficulty and needed skills : Medium. Requires a familiarity with not just python/biopython but some unusual data formats and datasets, and packages, and integrating them (geographic, phylogenetic, metadata, etc.).  Must be familiar with evolution, phylogenetics, biogeography, and the statistical hazards from oversimple interpretations of these.
 
 
 
; Mentors :  [http://bcbio.wordpress.com/ Brad Chapman] (plus?  Various python/phylogenetics gurus at NESCent etc might be consulted)
 
 
 
=== phyloXML support in BioRuby ===
 
 
 
; Rationale : Evolutionary trees are central to comparative genomics studies. Trees used in this context are usually annotated with a variety of data elements, such as taxonomic information, genome-related data (gene names, functional annotations) and gene duplication events, as well as information related to the evolutionary tree itself (branch lengths, support values). phyloXML is an XML data exchange standard that can represent this data. Trees in phyloXML format can be displayed and analyzed with [http://www.phylosoft.org/archaeopteryx/ Archaeopteryx] (the successor to [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/17/4/383 ATV]), which also allows manipulation and navigation of the tree. While tools exist to convert other formats (such as the widely used Newick and Nexus formats) to phyloXML, there is currently support for phyloXML in only one of the open source Bio* projects (in [http://www.bioperl.org/wiki/Phyloxml_Project_Demo BioPerl], as a result of Google's Summer of Code 2008).
 
; Approach : Build phyloXML support in Ruby. More specifically, extend the open source BioRuby project to support phyloXML (BioRuby 1.3.0 has just been released). This will entail (i) the development of objects to represent all the elements of phyloXML (sequences, taxonomic data, annotations, etc), (ii) the development of a parser to read in phyloXML, and (iii) a phyloXML writer.
 
; Challenges : Relating the data elements specific to phyloXML to the tree classes already in BioRuby while maintaining the standards of the BioRuby project. Development of a time and memory efficient phyloXML parser (the parser has to be able to process trees with thousands of external nodes, at least).
 
; Involved toolkits or projects : [http://www.bioruby.org/ BioRuby],  [http://www.phyloxml.org phyloXML]
 
; Degree of difficulty and needed skills : Medium. Requires experience in an object oriented programming language (such as C++, Java, or, ideally, Ruby). Experience in genomics or a related biological field is also critical. Knowledge of  BioRuby will obviously help, as well as familiarity with XML.
 
; Mentors : Christian Zmasek (and anyone else who wants to help)
 
  
=== BioPerl integration of the NeXML exchange standard + <code>Bio::Phylo</code> toolkit  ===
+
=== Google Plus ===
  
; Rationale : [http://www.nexml.org NeXML] is an emerging XML standard for the serialization and exchange of phylogenetic information. In Perl, the [http://phylo.sourceforge.net <code>Bio::Phylo</code>] toolkit is the preferred parser/writer interface for NeXML. While <code>Bio::Phylo</code> contains methods that will operate on BioPerl objects [such as alignments (<code>Bio::SimpleAlign</code>) or trees (<code>Bio::Tree</code>)], a set of methods to wrap <code>Bio::Phylo</code> functionality into BioPerl in a systematic and updateable way would lower barriers to broader use of this useful standard.
+
[https://plus.google.com/communities/103096212020630764091 OBF Summer of Code] on G+
  
; Approach : We would like to explore a couple of ways to form the linkage between BioPerl and <code>Bio::Phylo</code>, while still maintaining <code>Bio::Phylo</code>'s independence as a module. Since it is part of the implementation side of a rapidly evolving standard, it is more mutable than the average BioPerl module, and should be more nimble. One method would be implement a thin BioPerl wrapper around <code>Bio::Phylo</code>, that allows BioPerl objects to be passed easily in and out, and maintains a stable BioPerl-compliant API, hiding <code>Bio::Phlyo</code> API changes. However, since this project is exploratory, we could also prototype a version of <code>Bio::Phylo</code> that is directly implemented as a BioPerl module. We would also develop appropriate usage tests, test data sets, target audience use cases, benchmarks and profiles to compare the approaches we come up with.
+
=== Email ===
  
; Challenges :
+
For prospective students, the first point of contact should be the mailing list of the OBF project you are interested in working with:
:* Designing a relatively stable wrapper around a relatively dynamic module;
 
:* Designing tests that cover important use case scenarios meaningful to BioPerl users;
 
:* Identifying and interfacing <code>Bio::Phylo</code> output and NeXML-serialized data with up- and downstream BioPerl operations; e.g., adding a <code>Bio::SeqIO::nexml</code> module for doing BioPerl-native NeXML IO.
 
  
; Involved toolkits or projects :  
+
;BioPerl: [mailto:bioperl-l@lists.open-bio.org bioperl-l@lists.open-bio.org]
:*[[bp:Main Page|BioPerl]], [http://phylo.sourceforge.net <code>Bio::Phylo</code>]
+
;BioPython: [mailto:biopython@lists.open-bio.org biopython@lists.open-bio.org]
 +
;BioJava: [mailto:biojava-l@lists.open-bio.org biojava-l@lists.open-bio.org]
 +
;BioRuby: [mailto:bioruby@lists.open-bio.org bioruby@lists.open-bio.org]
 +
;BioSQL: [mailto:biosql-l@lists.open-bio.org biosql-l@lists.open-bio.org]
 +
;BioLib: [mailto:biolib-dev@lists.open-bio.org biolib-dev@lists.open-bio.org]
  
; Degree of difficulty and needed skills : Easy to medium difficulty. Perl fluency required; experience with object-oriented Perl very helpful; experience with biological data (sequences, sequence alignments, phylogenetic trees) a plus; experience with BioPerl itself will flatten the learning curve. 
+
Also, it would be a good idea to CC the organization administrator ([[User:EricTalevich|Eric Talevich]], [mailto:eric&#46;talevich&#64;gmail&#46;com eric&#46;talevich&#64;gmail&#46;com]), so he can make sure that you are properly taken care of!
  
; Mentors : [[bp:User:Majensen|Mark Jensen]], ...(rvos?),...
+
If you are not quite sure which project you would like to contribute to, you can email to the organization administrator for help. However, do not worry overly much about picking the right OBF project at the outset. If you are unsure, simply make your best guess, and other members of the email list will help you to find the best organization to suit your idea.
  
==Mentors==
+
=== IRC - Internet Relay Chat ===
  
* [http://bcbio.wordpress.com/ Brad Chapman] (MGH; Biopython)
+
OBF IRC channels are maintained on [http://freenode.net freenode], connect your IRC client to <code>chat.freenode.net</code>.
* [[bp:User:Cjfields|Chris Fields]] (U. Illinois at Urbana-Champaign; BioPerl)
 
* [[bp:User:Majensen|Mark Jensen]] (Fortinbras; BioPerl)
 
* [[bp:User:Rogerhall|Roger Hall]] (U. of Arkansas; BioPerl)
 
* [[User:Mauricio|Mauricio Herrera Cuadra]] (Yahoo! Inc.; backup org admin)
 
* [[User:Lapp|Hilmar Lapp]] (NESCent; org admin)
 
* [http://thebird.nl/pjwiki/wiki.pl Pjotr Prins] (BioLib)
 
* [http://biojava.org/wiki/User:Mark Mark Schreiber] (Novartis Institute for Tropical Diseases, Singapore; BioJava)
 
* [mailto:jaudall@gmail.com Joshua Udall] (BioPerl)
 
* [mailto:jw12@sanger.ac.uk Jonathan Warren] (Sanger Institute, UK; Biojava, DAS)
 
* [mailto:willishf@ufl.edu Scooter Willis] (Scripps Florida; Biojava)
 
* [http://monochrome-effect.net/ Christian Zmasek] (Burnham Institute for Medical Research; BioRuby)
 
  
== What should prospective students know? ==
+
;Main OBF GSoC Channel: <code>#obf-soc</code>
 +
;BioPerl: <code>#bioperl</code>
 +
;BioRuby: <code>#bioruby</code>
  
=== Before you apply ===
+
Some mentors and developers can regularly be found on IRC, see the list of OBF projects below for information on which projects have a channel and the name of the channel. And/or join <code>#obf-soc</code> on [http://freenode.net Freenode.] ''(If you do not have an IRC client installed, you might find the [http://en.wikipedia.org/wiki/List_of_IRC_clients comparison on Wikipedia], the [http://directory.google.com/Top/Computers/Software/Internet/Clients/Chat/IRC/ Google directory], or the [http://www.ircreviews.org/clients/ IRC Reviews] helpful. For Macs, [http://en.wikipedia.org/wiki/X-Chat X-Chat Aqua] works pretty well. If you have never used IRC, try the [http://irchelp.org/irchelp/ircprimer.html IRC Primer] at [http://irchelp.org/ IRC Help], which also has links to lots of other material.)''
  
* If you want to apply with your own idea, determine which O|B|F project you would be contributing to, and [[#Contact|contact us]] early on so we can try to find a mentor.
 
* Our scope for proposals that we will entertain is those extend one of affiliated toolkits. Project proposals that would create a new stand-alone piece of code are outside of our scope. 
 
* We are most interested in students who give us evidence that they have already or might develop a sustained interest in becoming future contributors to one (or more) of our projects.
 
* [[#Contact|Ask us questions]] about the project idea you have in mind.
 
* Write a project proposal draft, include a project plan (see below), and [[#Contact|bounce those off of us]].
 
  
  
Have I mentioned yet that you should [[#Contact|be in touch with us]] ''before'' you apply? The value of frequent and early communication in contributing to a distributed and collaboratively developed project can hardly be overemphasized. The same is true for becoming part of a community, even if only temporarily.
+
== Mentor Resources ==
  
=== When you apply ===
+
* [http://en.flossmanuals.net/GSoCMentoring/ GSoC Mentoring Guide]
 +
* [[Google_Summer_of_Code_Application_Evaluation|OBF Application Evaluation Guidelines]]
  
When applying, (aside from the information requested by Google) please provide the following in your application material.
+
== Scientific Achievements ==
# Why you are interested in the project you are proposing, uniquely suited to undertake it, and what do you anticipate to gain from it.
+
In this section we want to report all the scientific achievements of our community, scientific papers or grant funded project that used the tools developed during the Google Summer of Code over the years.
# Why are you interested in contributing to the O|B|F project that your work would be (or become) a part of? To what extent and in which ways do you anticipate to stay involved with the project?
+
* [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btv098 Sambamba: fast processing of NGS alignment formats.] Bioinformatics (2015) doi: 10.1093/bioinformatics/btv098
# A summary of your programming experience and skills.
+
* [http://csw.github.io/bioruby-maf/ bio-maf]: The long intergenic noncoding RNA landscape of human lymphocytes highlights the regulation of T cell differentiation by linc-MAF-4. Ranzani V et al. Nat Immunol. 2015 Mar;16(3):318-25. doi: 10.1038/ni.3093. Epub 2015 Jan 26.
# Programs or projects you have previously authored or contributed to, in particular those available as open-source, including, if applicable, any past Summer of Code involvement.
+
* [http://www.biomedcentral.com/1471-2105/13/209 Bio.Phylo: A unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython.] BMC Bioinformatics 2012, 13:209  doi:10.1186/1471-2105-13-209
# A project plan for the project you are proposing, even if your proposed project is directly based on one of the ideas above.
+
* [http://www.open-bio.org/wiki/Google_Summer_of_Code_2014#Loris_Cro vcf-mongo ]: Gene2Farm and WHEALBI European Research projects
#* A project plan in principle divides up the whole project into a series of manageable milestones and timelines that, when all accomplished, logically lead to the end goal(s) of the project. Put in another way, a project plan explains what you expect you will need to be doing, and what you expect you need to have accomplished, at which time, so that at the end you reach the goals of the project.
+
* [http://news.open-bio.org/news/2015/02/obf-gsoc-2014-wrapup/ OBF-GSoC-2014-WrapUp] is rich of science activities and results.
#* Do not take this part lightly. A compelling plan takes a significant amount of work. Empirically, applications with no or a hastily composed project plan have not been competitive, and a more thorough project plan can easily make an applicant outcompete another with more advanced skills.
+
* [http://www.rcsb.org RCSB PDB] is the north american access point to the world wide protein data bank, and uses BioJava extensively
#* A good plan will require you to thoroughly think about the project itself and how one might want to go about the work.
+
* Publications using BioJava:
#* We don't expect you to have all the experience, background, and knowledge to come up with the final, real work plan on your own at the time you apply. We do expect your plan to demonstrate, however, that you have made the effort and thoroughly dissected the goals into tasks and successive accomplishments that make sense.
+
** Prlić, Andreas, et al. "BioJava: an open-source framework for bioinformatics in 2012." Bioinformatics 28.20 (2012): 2693-2695.
#* We strongly recommend that you bounce your proposed project and your project plan draft off of us, using either the pertinent developers mailing list or the IRC channel(s). Through the project plan exercise you will inevitably discover that you are missing a lot of the pieces - we are there to help you fill those in as best as we can.
+
** Holland, Richard CG, et al. "BioJava: an open-source framework for bioinformatics." Bioinformatics 24.18 (2008): 2096-2097.
# Your possibly conflicting obligations or plans for the summer during the coding period.
+
** Pocock, Matthew, Thomas Down, and Tim Hubbard. "BioJava: open source components for bioinformatics." ACM Sigbio Newsletter 20.2 (2000): 10-12.
#* Although there are no hard and fast rules about how much you can do in parallel to your Summer of Code project, we do expect the project to be your primary focus of attention over the summer. If you look at your Summer of Code project as a part-time occupation, please don't apply to us.
+
*** 181 citations on Google Scholar
#* That notwithstanding, if you have the time-management skills to manage other work obligations concurrent with your Summer of Code project, feel encouraged to make your case and support it with evidence.
+
** Myers-Turnbull, Douglas, et al. "Systematic Detection of Internal Symmetry in Proteins Using CE-Symm." Journal of Molecular Biology, (2014) 426:11 pp. 2255–2268.
#* Most important of all, be upfront. If it turns out later that you weren't clear about other obligations, at best (i.e., if your accomplishment record at that point is spotless) it destroys our trust. Also, if you are accepted, don't take on additional obligations before discussing those with your mentor.
+
** Prlić, Andreas, et al. (2010) “Precalculated Protein Structure Alignments at the RCSB PDB website.” Bioinformatics 26(23), 2983-2985
#* One of the most common reasons for students to struggle or fail is being overstretched. Don't set yourself up for that - at best it would greatly diminish the amount of fun you'll have with your Summer of Code project.
+
** Bliven, Spencer, et al. (2015) "Detection of circular permutations within protein structures using CE-CP Bioinformatics." Bioinformatics. In press.
 
+
** Aerts, Stein, et al. "Toucan: deciphering the cis‐regulatory logic of coregulated genes." Nucleic acids research 31.6 (2003): 1753-1764.
=== Other information ===
+
** Vaida, Mircea-Florin, Radu Terec, and Lenuta Alboaie. "Alternative DNA Security Using BioJava." Digital Information and Communication Technology and Its Applications. Springer Berlin Heidelberg, 2011. 455-469.
 
+
** Ross, Christian, and Qingxi J. Shen. "Computational prediction and experimental verification of HVA1-like abscisic acid responsive promoters in rice (Oryza sativa)." Plant molecular biology 62.1-2 (2006): 233-246.
* Our [http://docs.google.com/Doc?id=dhs98hzv_7zn8bxqjm 2009 application document] with Google's questions and our answers.
+
** Finak, G., et al. "BIAS: bioinformatics integrated application software." Bioinformatics 21.8 (2005): 1745-1746.
* For questions of eligibility, see the [http://code.google.com/opensource/gsoc/2009/faqs.html#0_1_eligibility_83343977761348_13148542340972003 GSoC eligibility requirements for students]. These requirements must be met on April 20, 2009.
+
** Aerts, Stein, et al. "A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes." Bioinformatics 20.12 (2004): 1974-1976.
* There is also a [http://groups.google.com/group/google-summer-of-code-discuss Google group for posting GSoC questions] (and receiving answers; note that you will need to sign up for the group) that relate to the program itself (and are not specific to our organization).
+
** Hanganu, A. N. D. R. E. I., et al. "SLIDE: An interactive threading refinement tool for homology modeling." Rom J Biochem 1009.46: 123-127.
* Students receive a stipend from Google if accepted. See the [http://code.google.com/opensource/gsoc/2009/faqs.html#0_1_administrivia_842873138659_49145328697313184 Google SoC FAQ on payments] for full documentation.
+
** Kaladhar, DSVGK. "BioJava: A Programming Guide." (2012). LAP Lambert Academic Publishing , Germany. ISBN:3659167509 9783659167508
 
+
** Prins, J. C. P. "BioLib: Sharing high performance code between BioPerl, BioPython, BioRuby, R/Bioconductor and BioJAVA." 17th Annual International Conference on Intelligent Systems for Molecular Biology, Stockhol, Sweden, June 27-July 2, 2009. 2009.
== Open-Bio projects involved ==
+
** Tang, Si-Xin, Yi-Bing Li, and Hong-Bo He. "Designing a BioJava-based Software for RNA Sequence Analysis." Journal of Luoyang Institute of Technology 6 (2005): 016.
 
+
** Mangalam, Harry. "The Bio* toolkits—a brief overview." Briefings in bioinformatics 3.3 (2002): 296-302.
[[Image:BioPerl_logo_tiny.jpg|128px|right]]
+
** Ryu, Taewan. "Benchmarking of BioPerl, Perl, BioJava, Java, BioPython, and Python for primitive bioinformatics tasks and choosing a suitable language." International Journal of Contents 5.2 (2009): 6-15.
; [[bp:Main Page|BioPerl]] :
+
** McGuffee, James W. "Programming languages and the biological sciences." Journal of Computing Sciences in Colleges 22.4 (2007): 178-183.
:* [[bp:Main Page|Project website]]
 
:* Quick links:
 
:** [[bp:Becoming_a_developer|Information for new developers]]
 
:** source code browser for [http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk bioperl-live] (the main BioPerl code base), and [http://code.open-bio.org/svnweb/index.cgi/bioperl/ all BioPerl sub-projects]
 
:** [[bp:Project_priority_list|Priority list]] of things aside from the ideas above that need work
 
:** [[bp:Mailing_lists|Mailing lists]]
 
:** IRC: #bioperl on [http://freenode.net Freenode]
 
 
 
[[Image:Biojava_logo_tiny.jpg|right]]
 
; [http://biojava.org BioJava] :
 
:* [http://biojava.org Project website]
 
:* Quick links:
 
:** source code for [http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk biojava-live] (the main BioJava code base) and [http://code.open-bio.org/svnweb/index.cgi/biojava/ all BioJava sub-projects]
 
:** [http://biojava.org/wiki/BioJava:MailingLists Mailing lists]
 
:** No IRC channel at present
 
 
 
[[Image:Biopython_logo_tiny.png|right]]
 
; [[biopython:Main Page|Biopython]] :
 
:* [[biopython:Main Page|Project website]]
 
:* Quick links:
 
:** [[biopython:Contributing|Information for contributors]]
 
:** [[biopython:Mailing lists|Mailing lists]]
 
:** [http://code.open-bio.org/cgi/viewcvs.cgi/biopython/?cvsroot=biopython source code] (see also [[biopython:CVS|Biopython CVS documentation]]
 
:** No IRC channel at present
 
 
 
[[Image:BioRuby_logo_tiny.png|right]]
 
; [http://bioruby.org BioRuby] :
 
:* [http://bioruby.org Project website]
 
:* Quick links:
 
:** [http://lists.open-bio.org/mailman/listinfo/bioruby developers mailing list]
 
:** [http://github.com/bioruby/bioruby/tree/master source code]
 
:** No IRC channel at present
 
 
 
[[Image:BioSQL_logo.png|160px|right]]
 
; [[biosql:Main Page|BioSQL]] :
 
:* [[biosql:Main Page|Project website]]
 
:* Quick links:
 
:** [http://biosql.org/mailman/listinfo/biosql-l developers mailing list]
 
:** [http://code.open-bio.org/svnweb/index.cgi/biosql/browse/biosql-schema/trunk source code]
 
:** No IRC channel at present
 
  
[[Image:BioLib_logo_tiny.png|right]]
+
== Previous Years ==
; [http://biolib.open-bio.org BioLib] :
 
:* [http://biolib.open-bio.org Project website]
 
:* Quick links:
 
:** [http://lists.open-bio.org/mailman/listinfo/biolib-dev developers mailing list]
 
:** [http://github.com/pjotrp/biolib/tree/master source code]
 
:** No IRC channel at present
 
  
== Reference Facts & Links: <span class="plainlinks">[http://socghop.appspot.com/ Google Summer of Code 2009]</span> ==
+
This section contains links to content related to OBF's participation
 +
in GSoC in previous years.
 +
* [[Google_Summer_of_Code_2014|2014]] - 6 student projects
 +
* Google Summer of Code 2013 - OBF not accepted, some Bio* projects partnered with other organisations
 +
* [[Google_Summer_of_Code_2012|2012]] - 5 student projects
 +
* [[Google_Summer_of_Code_2011|2011]] - 6 student projects
 +
* [[Google_Summer_of_Code_2010|2010]] - 6 student projects
  
* Mentoring organizations apply between March 9-13, 2009. Accepted mentoring organizations will be published March 18. See [http://code.google.com/opensource/gsoc/2009/faqs.html#0_1_timeline_5354032302481437_ full set of timelines].
 
* Google expects to accept around 150 mentoring organizations, a bit less than in 2008 (when they accepted 175). If the trend over the past years is any indication, this will be out of at least 3x as many organizations that apply.
 
* Students apply between March 23-April 3, 2009. The [http://code.google.com/opensource/gsoc/2009/faqs.html#0_1_eligibility_83343977761348_13148542340972003 eligibility requirements for students] are in the GSoC FAQ.
 
* [http://code.google.com/opensource/gsoc/2009/faqs.html#0_1_development_where_91701355_4247830955169275 Development occurs on-line], there is no requirement or expectation to travel, neither for students nor for mentors.
 
  
 
[[Category:Google Summer of Code]]
 
[[Category:Google Summer of Code]]

Latest revision as of 10:39, 18 February 2016

GSoC15-logo-small.jpg

Google Summer of Code (GSoC) is a student internship program for open-source projects. The program offers eligible student developers stipends to write code for open source projects over a period of 3 summer months ("flip bits, not burgers"). See the Google Summer of Code Main Site for general information about the Google Summer of Code program, how to apply, frequently asked general questions, and more.


GSoC 2016

The Google Summer of Code 2016 is ON! OBF is once again applying as a GSoC mentoring organization this year. Interested mentors and students should subscribe to the OBF/GSoC mailing list. Please announce yourself, so we know who you are! The details of each of our project ideas are listed below, including potential mentors.

See http://obf.github.io/GSoC/ for more information about the GSoC program and additional ways to get in touch with us.

Facts & Links

Time Line 
GSoC 2016 FAQ 
Info from Google 
  • There is also a Google group for posting GSoC questions (and receiving answers; note that you will need to sign up for the group) that relate to the program itself (and are not specific to our organization).
  • Students receive a stipend from Google if accepted. See the GSoC 2016 FAQ for full documentation.
  • Development is done entirely remotely and on-line; there is no requirement or expectation for either students or mentors to travel.

Why apply?

One of the most important features of the program is that students are paired with mentors, who are typically experienced developers from the project to which the student is contributing. The mentor guides the student to work productively within the community, and helps the student avoid obstacles and pitfalls. The program is global - students and mentors may be located anywhere where they have an internet connection (except for countries affected by US trade restrictions), and no travel is required. Thus, aside from the stipend and mentorship aspects, the student's experience in the internship closely mirrors normal work on distributed development projects. Effective work habits for distributed development are typically not taught as part of computer science curricula, yet are highly desired in the increasingly global and distributed software, IT, and biotechnology industries.

From the viewpoint of each open-source project, the program not only offers to pay students for contributing, but more importantly, offers an opportunity to recruit new developers who will hopefully go on to become regular, sustaining contributors.

Project Ideas

Our GSoC ideas from each project are collected here: http://obf.github.io/GSoC/

OBF Projects Accepting Applicants

BioPerl logo tiny.jpg
BioPerl 
Biopython logo tiny.png
BioPython 
Biojava logo tiny.jpg
BioJava 
BioRuby logo tiny.png
BioRuby 
BioSQL logo.png
BioSQL 
BioHaskell 
Biocaml 

Guide for prospective GSoC students

Before you apply

  • Proposals should extend one of affiliated toolkits, not start a new project.
  • If you want to apply with your own idea, it's best to contact the OBF subproject you're interested in well before the application deadline, so we can work with you to find a mentor and solidify your project idea and application.
  • Ask us questions on the subproject mailing lists about the project idea you have in mind.
  • Write a project proposal draft, include a project plan (see below), and send it to a project mailing list for comments before submitting it.

Again, students are strongly encouraged to contact us as early as possible. Frequent and early communication is extremely valuable for putting together successful projects.

When you apply

When applying, (aside from the information requested by Google) please provide the following in your application material.

  1. Your complete contact information, including full name, physical address, preferred email address, and telephone number, plus other pertinent contact information such as IRC handles, etc.
  2. Why you are interested in the project you are proposing and are well-suited to undertake it.
  3. A summary of your programming experience and skills.
  4. Programs or projects you have previously authored or contributed to, in particular those available as open-source, including, if applicable, any past Summer of Code involvement.
  5. A project plan for the project you are proposing, even if your proposed project is directly based on one of the proposed project ideas for member projects.
    • A project plan in principle divides up the whole project into a series of manageable milestones and time-lines that, when all accomplished, logically lead to the end goal(s) of the project. Put in another way, a project plan explains what you expect you will need to be doing, and what you expect you need to have accomplished, at which time, so that at the end you reach the goals of the project.
    • Do not take this part lightly. A compelling plan takes a significant amount of work. Empirically, applications with no or a hastily composed project plan have not been competitive, and a more thorough project plan can easily make an applicant out compete another with more advanced skills.
    • A good plan will require you to thoroughly think about the project itself and how one might want to go about the work.
    • We don't expect you to have all the experience, background, and knowledge to come up with the final, real work plan on your own at the time you apply. We do expect your plan to demonstrate, however, that you have made the effort and thoroughly dissected the goals into tasks and successive accomplishments that make sense.
    • We strongly recommend that you bounce your proposed project and your project plan draft off of us, using either the pertinent developers mailing list or the IRC channel(s). Through the project plan exercise you will inevitably discover that you are missing a lot of the pieces - we are there to help you fill those in as best as we can.
  6. Any obligations, vacations, or plans for the summer that may require scheduling during the GSoC work period.
    • We expect the your GSoC project to be your primary focus over the summer. It should not be regarded as a part-time occupation.
    • If you feel that you can manage other work obligations concurrently with your Summer of Code project, make your case and support it with evidence.
    • Be honest and open. If it turns out later that you weren't clear about other obligations, at best (i.e., if your accomplishment record at that point is spotless) our trust in you will be severely degraded. Also, if you are accepted, discuss with your GSoC mentor before taking on additional obligations.
    • One of the most common reasons for students to struggle or fail is being overcommitted. Do not set yourself up for failure! GSoC summers should be fun and rewarding!

Student Progress Reports

In addition to writing code, accepted students send weekly updates to the OBF community on their project's progress. These updates allow us to keep aware of how GSoC students are doing, give students a forum to ask any questions, and promote overall community bonding.

At the beginning of the summer, we ask that you set up a blog for the GSoC project (or a category/tag on your existing blog) which you will use to summarize your progress every week, as well as longer posts about your work if you'd like. (See these examples from 2013.)

Then, at the start of each week:

  1. Post an update on your blog: What did you do last week? What do you plan to do this week? Do you have any unanswered questions, any unsolved problems from the last week, interesting observations or anything else you'd like to mention?
  2. Email the URL and text of the post (or a short summary) to the host project's mailing list (your mentors will confirm which one to use) and the main OBF GSoC mailing list (gsoc@lists.open-bio.org).

You will be writing under your own name, but with a clear association with your mentors, the OBF and its projects, so please take this seriously and be professional. Remember that your blog will be one of the first things found by anyone interested in the project you're working on, and can be a valuable resource to them — as well as a significant part of your online presence.

Contact

Before applying, please read our documentation on information that students should know and guidelines we expect you to follow. We also require that you include certain information, listed below, under "When you apply."

Staff and org Admins

Organization administrator
Eric Talevich (eric.talevich@gmail.com)
Backup administrator
Raoul Bonnal (email) (IRC: helius | channels: #obf-soc, #bioruby, #gsoc ) (Skype: ilpuccio)

Google Plus

OBF Summer of Code on G+

Email

For prospective students, the first point of contact should be the mailing list of the OBF project you are interested in working with:

BioPerl
bioperl-l@lists.open-bio.org
BioPython
biopython@lists.open-bio.org
BioJava
biojava-l@lists.open-bio.org
BioRuby
bioruby@lists.open-bio.org
BioSQL
biosql-l@lists.open-bio.org
BioLib
biolib-dev@lists.open-bio.org

Also, it would be a good idea to CC the organization administrator (Eric Talevich, eric.talevich@gmail.com), so he can make sure that you are properly taken care of!

If you are not quite sure which project you would like to contribute to, you can email to the organization administrator for help. However, do not worry overly much about picking the right OBF project at the outset. If you are unsure, simply make your best guess, and other members of the email list will help you to find the best organization to suit your idea.

IRC - Internet Relay Chat

OBF IRC channels are maintained on freenode, connect your IRC client to chat.freenode.net.

Main OBF GSoC Channel
#obf-soc
BioPerl
#bioperl
BioRuby
#bioruby

Some mentors and developers can regularly be found on IRC, see the list of OBF projects below for information on which projects have a channel and the name of the channel. And/or join #obf-soc on Freenode. (If you do not have an IRC client installed, you might find the comparison on Wikipedia, the Google directory, or the IRC Reviews helpful. For Macs, X-Chat Aqua works pretty well. If you have never used IRC, try the IRC Primer at IRC Help, which also has links to lots of other material.)


Mentor Resources

Scientific Achievements

In this section we want to report all the scientific achievements of our community, scientific papers or grant funded project that used the tools developed during the Google Summer of Code over the years.

  • Sambamba: fast processing of NGS alignment formats. Bioinformatics (2015) doi: 10.1093/bioinformatics/btv098
  • bio-maf: The long intergenic noncoding RNA landscape of human lymphocytes highlights the regulation of T cell differentiation by linc-MAF-4. Ranzani V et al. Nat Immunol. 2015 Mar;16(3):318-25. doi: 10.1038/ni.3093. Epub 2015 Jan 26.
  • Bio.Phylo: A unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython. BMC Bioinformatics 2012, 13:209 doi:10.1186/1471-2105-13-209
  • vcf-mongo : Gene2Farm and WHEALBI European Research projects
  • OBF-GSoC-2014-WrapUp is rich of science activities and results.
  • RCSB PDB is the north american access point to the world wide protein data bank, and uses BioJava extensively
  • Publications using BioJava:
    • Prlić, Andreas, et al. "BioJava: an open-source framework for bioinformatics in 2012." Bioinformatics 28.20 (2012): 2693-2695.
    • Holland, Richard CG, et al. "BioJava: an open-source framework for bioinformatics." Bioinformatics 24.18 (2008): 2096-2097.
    • Pocock, Matthew, Thomas Down, and Tim Hubbard. "BioJava: open source components for bioinformatics." ACM Sigbio Newsletter 20.2 (2000): 10-12.
      • 181 citations on Google Scholar
    • Myers-Turnbull, Douglas, et al. "Systematic Detection of Internal Symmetry in Proteins Using CE-Symm." Journal of Molecular Biology, (2014) 426:11 pp. 2255–2268.
    • Prlić, Andreas, et al. (2010) “Precalculated Protein Structure Alignments at the RCSB PDB website.” Bioinformatics 26(23), 2983-2985
    • Bliven, Spencer, et al. (2015) "Detection of circular permutations within protein structures using CE-CP Bioinformatics." Bioinformatics. In press.
    • Aerts, Stein, et al. "Toucan: deciphering the cis‐regulatory logic of coregulated genes." Nucleic acids research 31.6 (2003): 1753-1764.
    • Vaida, Mircea-Florin, Radu Terec, and Lenuta Alboaie. "Alternative DNA Security Using BioJava." Digital Information and Communication Technology and Its Applications. Springer Berlin Heidelberg, 2011. 455-469.
    • Ross, Christian, and Qingxi J. Shen. "Computational prediction and experimental verification of HVA1-like abscisic acid responsive promoters in rice (Oryza sativa)." Plant molecular biology 62.1-2 (2006): 233-246.
    • Finak, G., et al. "BIAS: bioinformatics integrated application software." Bioinformatics 21.8 (2005): 1745-1746.
    • Aerts, Stein, et al. "A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes." Bioinformatics 20.12 (2004): 1974-1976.
    • Hanganu, A. N. D. R. E. I., et al. "SLIDE: An interactive threading refinement tool for homology modeling." Rom J Biochem 1009.46: 123-127.
    • Kaladhar, DSVGK. "BioJava: A Programming Guide." (2012). LAP Lambert Academic Publishing , Germany. ISBN:3659167509 9783659167508
    • Prins, J. C. P. "BioLib: Sharing high performance code between BioPerl, BioPython, BioRuby, R/Bioconductor and BioJAVA." 17th Annual International Conference on Intelligent Systems for Molecular Biology, Stockhol, Sweden, June 27-July 2, 2009. 2009.
    • Tang, Si-Xin, Yi-Bing Li, and Hong-Bo He. "Designing a BioJava-based Software for RNA Sequence Analysis." Journal of Luoyang Institute of Technology 6 (2005): 016.
    • Mangalam, Harry. "The Bio* toolkits—a brief overview." Briefings in bioinformatics 3.3 (2002): 296-302.
    • Ryu, Taewan. "Benchmarking of BioPerl, Perl, BioJava, Java, BioPython, and Python for primitive bioinformatics tasks and choosing a suitable language." International Journal of Contents 5.2 (2009): 6-15.
    • McGuffee, James W. "Programming languages and the biological sciences." Journal of Computing Sciences in Colleges 22.4 (2007): 178-183.

Previous Years

This section contains links to content related to OBF's participation in GSoC in previous years.

  • 2014 - 6 student projects
  • Google Summer of Code 2013 - OBF not accepted, some Bio* projects partnered with other organisations
  • 2012 - 5 student projects
  • 2011 - 6 student projects
  • 2010 - 6 student projects