Please visit our ***NEW*** OBF/BOSC website: https://www.open-bio.org/ |
-
Difference between revisions of "Google Summer of Code 2015 Ideas"
(→Provide Nextflow with a GUI based on NoFlo UI) |
(→Candidate a new project for OBF) |
||
Line 239: | Line 239: | ||
; Mentors | ; Mentors | ||
: TEXT HERE TEXT HERE | : TEXT HERE TEXT HERE | ||
+ | |||
+ | == [https://github.com/dbcls/bh14/wiki/Yummydata Team YummyData] == | ||
+ | |||
+ | YummyData is a system of collecting and disseminating various statistical and functional data of SARQL endpoints in the Life Science domain. | ||
+ | |||
+ | === Sharing SPARQL endpoint data === | ||
+ | |||
+ | ; Rationale | ||
+ | : TEXT HERE TEXT HERE | ||
+ | |||
+ | ; Approach | ||
+ | : TEXT HERE TEXT HERE | ||
+ | |||
+ | ; Languages and skill | ||
+ | : TEXT HERE TEXT HERE | ||
+ | |||
+ | ; Code | ||
+ | : TEXT HERE TEXT HERE | ||
+ | |||
+ | ; Mentors | ||
+ | : Andrea Splendiani, Thierry Lombardot, Yasunori Yamamoto | ||
+ | |||
== Candidate a new project for OBF == | == Candidate a new project for OBF == |
Revision as of 03:25, 19 February 2015
The details of each of our project ideas are listed below, including potential mentors. Interested mentors and students should subscribe to the OBF/GSoC mailing list and announce their interest.
See the main OBF Google Summer of Code page for more information about the GSoC program and additional ways to get in touch with us.
Contents
Cross-project ideas
OBF is an umbrella organization which represents many different programming languages used in bioinformatics. In addition to working with each of the "Bio*" projects (listed below), this year we are also accepting a category of "cross-project" ideas that cover multiple programming languages or projects. These collaborative ideas are broadly defined and can be thought of as "unfinished" — interested students should adapt the ideas to their own strengths and goals, and are responsible for the quality of the final proposed idea in their application.
Feel free to propose your own entirely new idea. You can also draw ideas from Genome Informatics (GMOD) and the National Evolutionary Synthesis Center (NESCent).
Provide Nextflow with a GUI based on NoFlo UI
- Rationale
- Nextflow is a data-driven toolkit for computational pipelines that simplify writing parallel and scalable pipelines in a portable manner. Its goal is to make data analysis required by next generation sequence technologies and, in more general terms, bioinformatics applications easier for researchers.
- It follows the UNIX philosophy where many small tools can be composed together to create efficient computational solutions where individual parts can be easily replaced. It has been designed to allow developers to fast prototype applications reusing their existing tools and scripts. For this reason it has been developed primarily as a command line oriented tool. However, complex pipelines may benefit from having a visualisation layer that helps the developer design and represent the application workflow logic.
- NoFlo-UI is the presentation layer for NoFlo, a flow-based programming (FBP) environment that makes software creation more accessible and collaborative. It provides an interactive interface which allows you to create a computational workflow by dragging, dropping and connecting the different task components. It basically allows you to "draw" an application in such a way that it resembles a subway map. This "map" can then be more easily understood, shared and curated by other scientists, compared to managing endless files of source code.
- Approach
- The goal, therefore, of the this proposal is to implement a graphical front-end, based on the NoFlo-UI project, for the Nextflow programming environment. This would provide the latter with a presentation layer that would allow researchers to "sketch" their computational pipelines instead of programming them, making it easier to share and handle complex task interactions in their application logic.
- Ideally this integration will implement a two-way tool in such a way that changes, applied in the visual editor, are reflected in the Nextflow scripting language and vice-versa.
- Languages and skills
- The student is required to have proven working ability with Javascript, HTML5, Polymer and Node.js for the front-end development and good level of knowledge of the Groovy/Java programming languages for Nextflow side.
- He/she may also benefit by having some theoretical and practical knowledge of programming language syntax and grammar parsers.
- Code
- Nextflow source code is available at the following repository (https://github.com/nextflow-io/nextflow/)
- NoFlo-UI source code is available at this link (https://github.com/noflo/noflo-ui/)
- Mentors
- Paolo Di Tommaso
Nextflow low latency scheduling and in-memory data processing
- Rationale
- Nextflow is a data-driven toolkit for computational pipelines that simplify writing parallel and scalable pipelines in a portable manner across different execution platforms. Its goal is to make data analysis required by next generation sequence technologies and, in more general terms, bioinformatics applications easier for researchers.
- Nextflow does not implement a task scheduling strategy on its own but delegates it to the underlying processing infrastructure, which in most cases is a grid engine like technology (SGE, PBS, SLURM, etc). However, these platforms were designed for job processing in a batch scheduling fashion, i.e. a few long duration jobs scheduled sequentially to the computer-cluster facility. This model suffers from very high latencies which make it unfit for highly parallel and short-lived jobs that are more and more common in bioinformatics data analysis.
- The goal of this proposal is to integrate the Sparrow scheduler and Tachyon in-memory file system with Nextflow to overcome the limitations of batch-like schedulers generally available in most common cluster computation facilities.
- The first is a high throughput, low latency distributed cluster scheduler, while the second is a memory centric distributed file system. Both of them are open source research projects developed at UC Berkeley.
- The integration of these technologies would allow Nextflow to manage large distributed workloads in a more efficient and timely manner and to decrease the task granularity in Nextflow pipelines. Thus gaining an higher parallelism degree and better applications performance.
- Approach
- Nextflow implements an extensible mechanism that allows it to support several computing platforms and file systems in a portable manner.
- The approach suggested consists in implementing a new Nextflow executor which integrates the Sparrow scheduler and its features.
- The support for Tachyon file system can be added to Nextflow by writing an adapter layer that follows the Java JSR-203 specification.
- Languages and skills
- Student is required to have proven working ability with the Java and/or Groovy programming languages and a good level of theoretical and practical knowledge in parallel and concurrent programming.
- Code
- Nextflow source code repository (https://github.com/nextflow-io/nextflow)
- Sparrow cluster source code (https://github.com/radlab/sparrow)
- Tachyon file system source code (https://github.com/amplab/tachyon)
- Mentors
- Paolo Di Tommaso
TITLE
- Rationale
- TEXT HERE TEXT HERE
- Approach
- TEXT HERE TEXT HERE
- Languages and skill
- TEXT HERE TEXT HERE
- Code
- TEXT HERE TEXT HERE
- Mentors
- TEXT HERE TEXT HERE
BioPerl
- Mailing lists
- IRC:
#bioperl
on Freenode - Information for new developers
- Source code browser for bioperl-live (the main BioPerl code base), and all BioPerl sub-projects
- Priority list of things that need work, as another source for student-conceived project ideas
TITLE
- Rationale
- TEXT HERE TEXT HERE
- Approach
- TEXT HERE TEXT HERE
- Languages and skill
- TEXT HERE TEXT HERE
- Code
- TEXT HERE TEXT HERE
- Mentors
- TEXT HERE TEXT HERE
BioJava and JSBML
- BioJava developer mailing list
- JSBML developer mailing list
- BioJava modules as another source for student-conceived project ideas
- Source code for biojava-live (the main BioJava code base) and all BioJava sub-projects
For GSoC 2014, BioJava is partnering with the Systems Biology Markup Language (SMBL) team to bring enhancements to JSBML, the standard Java implementation of SBML, and bring SBML features to other Java-based systems biology software. See the SMBL website for more ideas from the SBML team.
Students working on these projects will interact with both the BioJava and JSBML communities, which overlap. Most development will happen on the JSBML codebase, although BioJava is used as a supporting library for some components.
TITLE
- Rationale
- TEXT HERE TEXT HERE
- Approach
- TEXT HERE TEXT HERE
- Languages and skill
- TEXT HERE TEXT HERE
- Code
- TEXT HERE TEXT HERE
- Mentors
- TEXT HERE TEXT HERE
BioPython
TITLE
- Rationale
- TEXT HERE TEXT HERE
- Approach
- TEXT HERE TEXT HERE
- Languages and skill
- TEXT HERE TEXT HERE
- Code
- TEXT HERE TEXT HERE
- Mentors
- TEXT HERE TEXT HERE
BioRuby
- Developers mailing list
- Source code
- IRC:
#bioruby
on Freenode
TITLE
- Rationale
- TEXT HERE TEXT HERE
- Approach
- TEXT HERE TEXT HERE
- Languages and skill
- TEXT HERE TEXT HERE
- Code
- TEXT HERE TEXT HERE
- Mentors
- TEXT HERE TEXT HERE
BioHaskell
Biohaskell has its own gsoc page. We currently have 2 (+1) open problems listed there. In addition, we accept peoples' own ideas and have a number of open problems not listed. The latter fall somewhere between bachelors thesis and PhD work and are harder to nicely package up.
Fast k-mer indexing
Fast k-mer indexing requires a data structure mapping a short string of k characters to a value. While trivial to do with almost all key-value maps, we also require a very memory-efficient storage system. Knowledge of suffix structures is a definite plus.
- Mentors
- Ketil Malde?
Low-level bit and stream-fusion optimizations
In Haskell, we typically don't talk that much about low-level implementation details. For some algorithms, low-level details (especially bitwise operations and SIMD instructions) become important. I have a library dealing with bitsets but it is not yet fully efficient. Getting SIMD instructions to play nice with /generic/ DP recursion schemes is probably really hard.
- Mentors
- Christian Hoener zu Siederdissen
Biocaml
TITLE
- Rationale
- TEXT HERE TEXT HERE
- Approach
- TEXT HERE TEXT HERE
- Languages and skill
- TEXT HERE TEXT HERE
- Code
- TEXT HERE TEXT HERE
- Mentors
- TEXT HERE TEXT HERE
Team YummyData
YummyData is a system of collecting and disseminating various statistical and functional data of SARQL endpoints in the Life Science domain.
Sharing SPARQL endpoint data
- Rationale
- TEXT HERE TEXT HERE
- Approach
- TEXT HERE TEXT HERE
- Languages and skill
- TEXT HERE TEXT HERE
- Code
- TEXT HERE TEXT HERE
- Mentors
- Andrea Splendiani, Thierry Lombardot, Yasunori Yamamoto
Candidate a new project for OBF
Please if you want to be part of OBF but you porject is not yet listed above, contact us and let us know about your project and your proposal.
TITLE
- Rationale
- TEXT HERE TEXT HERE
- Approach
- TEXT HERE TEXT HERE
- Languages and skill
- TEXT HERE TEXT HERE
- Code
- TEXT HERE TEXT HERE
- Mentors
- TEXT HERE TEXT HERE