Project Proposals for Google Summer of Code 2012
This year, six students were accepted and will be working over the summer mentored by volunteers from the INCF scientific community. Read more about the projects: incf.org/gsoc/2012
________________________________________________________________________________________
Proposals and ideas for potential INCF projects within Google Summer of Code.
Google Summer of Code is a global program that offers post-secondary student developers ages 18 and older stipends, financed by Google, to write code for various open source software projects. Students can apply to take part in projects proposed by mentoring organizations – you can see our proposed projects below. Accepted student applicants are paired with a mentor or mentors from the participating projects. To learn more about the program, use the links under Further Information at the bottom of the page.
GPU-Based Neural Simulator
Modeling of various chemical and electrical phenomena observed in neuronal systems became a well-established methodology in neuroscience. Analysis of models, in combination with experimental studies, provides fast and inexpensive tool for testing biological hypotheses.
Some of the most popular modeling formalisms are widely used and already implemented in open source simulation packages. An example is the Hodgkin-Huxley formalism that can reproduce experimentally obtained measurements of single neuron dynamics, including details of membrane ion channels, synaptic channels, and realistic reconstruction of neuron morphology.
Parallelism of computations is an inherent property of neuronal systems. Executing neuronal simulations on a standard CPUs, using a sequential program code, leads to loss of this intrinsic parallelism, which leads to simulated models being slower than their corresponding biological systems.
Modern Graphics Processing Units (GPUs) are powerful parallel processing platforms. In recent years GPU devices have demonstrated clear potential as a platform for computationally intensive scientific simulations.
The aim of this project is to employ parallel processing capabilities of new generation GPU devices using OpenCL software framework for modeling biological neuronal systems based on Hodgkin-Huxley formalism.
Skills: Good knowledge of C/C++, experience with CUDA/OpenCL a plus,…
Aims: GPU based implementation of neuronal networks using OpenCL software framework.
Mentors: Stanislav Stankovic and Marja-Leena Linne, Tampere University of Technology, INCF Node Finland
A simulator-independent API for multi-compartmental neuronal modelling
Computational neuroscience has produced a diversity of software for simulations of networks of spiking neurons, with both negative and positive consequences. On the one hand, each simulator uses its own programming or configuration language, leading to considerable difficulty in porting models from one simulator to another. This impedes communication between investigators and makes it harder to reproduce and build on the work of others. On the other hand, simulation results can be cross-checked between different simulators, giving greater confidence in their correctness, and each simulator has different optimizations, so the most appropriate simulator can be chosen for a given modelling task. The PyNN meta-simulator (http://neuralensemble.org/PyNN) provides a common programming interface to multiple simulators, which aims to reduce or eliminate the problems of simulator diversity while retaining the benefits.
PyNN currently supports networks of point-neurons, such as the integrate-and-fire model. However, a large number of studies use networks of morphologically-detailed neurons. These studies would benefit from the ability to run simulations on multiple simulators to enable cross-checking and simplify porting models from one simulator to another.
Skills
A working knowledge of Python and XML.
Aims
Implement support for multi-compartmental modelling in PyNN, including the ability to read and write NeuroML descriptions of ion channels and cell morphologies.
Mentored by Andrew Davison (French INCF Node) (and maybe NeuroML developers such as Sharon Crook/Padraig Gleeson).
An automated electronic lab notebook for reproducible computational research
In computational, simulation-based science, reproduction of previous experiments, and establishing the provenance of results, ought to be easy, given that computers are deterministic, not suffering from the problems of inter-subject and trial-to-trial variability that make reproduction of biological experiments more challenging.
In general, however, it is not easy, perhaps due to the complexity of our code and our computing environments, and the difficulty of capturing every essential piece of information needed to reproduce a computational experiment using existing tools such as spreadsheets, version control systems and paper noteboooks.
Sumatra (http://neuralensemble.org/sumatra) is a tool for managing and tracking projects based on numerical simulation or analysis, with the aim of supporting reproducible research. It can be thought of as an automated electronic lab notebook for simulation/analysis projects. It consists of (1) a command-line interface, smt, for launching simulations/analyses with automatic recording of information about the context, annotating these records, linking to data files, etc. (2) a web interface with a built-in web-server, smtweb, for browsing and annotating simulation/analysis results. (3) a Python API, on which smt and smtweb are based, that can be used in your own scripts in place of using smt.
Skills
A working knowledge of Python plus either: (1) experience with at least one other widely-used scientific programming language (e.g. C++, Matlab, R); (2) experience with developing cross-platform GUI-based applications; (3) experience with Python-based web frameworks (e.g. Django) and with client-side Javascript development.
Aims
This project would involve adding functionality to Sumatra in one or more of the following three areas:
* adding support for dependency-tracking in at least one of C++, Matlab, R
* developing a cross-platform graphical user interface-based application, built on top of Sumatra, for managing scientific projects based on numerical computation
* enhancing the web interface to enable greater interactivity and the ability to easily compare results across different simulations/analyses.
Mentored by Andrew Davison (French INCF Node).
Web Interchange Format for Electrophysiological Data
NEO is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats, including Spike2, NeuroExplorer, AlphaOmega, Axon, Blackrock, Plexon, Tdt, and support for writing to a subset of these formats, including several proprietary formats as well as non-proprietary formats such as HDF5 (http://packages.python.org/neo/). The project objective is to develop an library which extends a list of supported formats and makes data according to the NEO data model available and easy for exchange in the web. The library should be written in python and should support both reading and writing data. Main purpose is to achieve a programming layer which is able to convert Python classes (NEO) into web interchange format (preliminary JSON) and back. The new library will help to accelerate web-based communication between Open Scientific Databases and Data Hubs in Neurophysiology and thus support scientific collaboration, inline with the proven importance of the efficient data exchange in this experimental field.
Mentored by Andrey Sobolev (German INCF Node)
Client library for Electrophysiological DATA API
G-Node Portal is a platform for Neuroscientists to facilitate data access, data storage, data analysis and exchange (https://portal.g-node.org/data/). With G-Node Portal scientists may store and efficiently organize their experimental data, exchange data with collaborators, search for published data or scientists with similar interests. The platform provides a REST API to access data and metadata using HTTP (http://g-node.github.com/g-node-portal/data_api/data_api_specification.html). The project objective is to develop a python-based library, which supports main functions of the server-side DATA API and makes them available at the python level on the client. We are aimed to include best practices used in open-source REST clients like gdata (http://code.google.com/p/gdata-python-client/), pyfacebook (https://github.com/sciyoshi/pyfacebook/), to develop an efficient user- and developer- friendly library, tuned to dealing with signal data (electrophysiology). The Python client will extend the ways to access the DATA API in addition to the existing Matlab client. The project requires good knowledge of python and understanding of REST, HTTP and JSON.
Mentored by Andrey Sobolev (German INCF Node)
NERD - NEuRonalDatabase
The modern Neurosciences produce amazingly huge amounts of data - both from experiments and large scale simulations. While sharing of such scientific data is common in other areas of biological sciences, most neuroscientific data is kept private.
NERD is a mongoDB based experimental project that is intended to complement existing database solutions to promote the sharing and archiving of neuroscientific data by providing easy to use storage of large amounts of time series data while integrating and indexing corresponding meta information. In order to facilitate quick searching on all levels of data annotation NERD uses a bimodal approach towards data storage. We separate the core data and descriptions of such (structure and meta data) and store both differentially while still keeping them tightly interlinked. By exploiting the main features of MongoDB - that is document oriented databasing and a large scale distributed filesystem - and combining them with the power of hdf5, we will try to provide a neat and scalable solution to the aforementioned challenges.
Skills
Ample skills in python and or c/c++ are expected. Additional knowledge in MongoDB or other Document Oriented Databases are welcome!
Aims
NERD has just started and is a loose chain project and mainly experimental. Our current aims are more in the direction of providing a proof of concept - some sort of working example. Together with the main drivers of this Project you will - depending on your level of expertise - be able to not only work on your own sub-project (complete from design to implementation) but also to leave a long lasting Footprint on an evolving Project
Potential projects are diverse. Some examples are:
- design and exemplary implementation of a multi purpose REST API
- gwt based UI for NERD
- improving the NERD core - that is optimizing data access
Mentored by members of the German INCF Node (Christian Kellner, Christian Garbers)
NavR - Neuroshare based data visualizer
The Neuroshare API [1] was made to provide a unified way to access neurophysiological data (i.e. in a vendor neutral way). It was finished quite some time ago and there exist various implementations (DLLs) from different vendors. With the help of the nswineproxy project [2] it is also possible to use these DLLs from Linux; the neuroshare-python [3] also provides a high level python interface to the Neuroshare API. What currently is still missing is a viewer program that provides the user with a quick way to take a look and visually explore the data of recordings, both analog signals and spike trains. The aim of the Summer of Code project would be to develop a first version of such a tool.
Skills
A solid knowledge of the python programming language (or C if preferred) is required; expertise in the UI toolkit GTK+ is also required. familiarity with data visualising libraries would be a bonus.
Aims
Build a first version of a neuroshare based data visualizer tools that would allow scientists to explore data files.
Mentored by members of the German INCF Node
(Christian Kellner, Christian Garbers)
[2] https://github.com/G-Node/nswineproxy
[3] https://github.com/G-Node/python-neuroshare
XNAT and Human Connectome
- The Human Connectome Project (www.humanconnectome.org) includes a wide array of behavioral measures that can be used to mine the data. To facilitate mining, we would like to implement a faceted search engine for constructing subject groups. A mockup of the interface is shown to the right. The search interface would dynamically render a plot of the data based on the underlying distribution (e.g. a bar chart for binnable data, a pie chart for categorical data). The plots would be interactive, allowing users to select subsets of the underlying distribution. The search engine will be implemented in XNAT and will be available to the Human Connectome Project and any other XNAT-based repository.
- There are many XNAT-based neuroimaging data repositories out in the world. Currently, they are all standalone silos. With a little effort, they could be federated to present a unified view of distributed data sets. This project would entail developing the user authentication, user authorization, user interface, and distributed query capabilities required to enable such a federated view. XNAT is an open source imaging informatics platform used to manage and share imaging and related data.
Aims
Add-on to or extensions of XNAT as outlined above.
Skills
A working knowledge of Java as well as Javascript and JSON for the search interface and web security protocols for the federation.
Mentored by Dan Marcus and the XNAT project.
Multiple Interacting Instances of Neuronal Dynamics
Multiple Interacting Instances of Neuronal Dynamics (MIIND, http://miind.sf.net, website slightly out-of-date) is an Open Source (modified BSD license) project for the simulation of large-scale cortical networks. It has been used to create models of visual attention and language representation. Although it is a neural simulator, it operates on the population level rather than that of individual neurons. Ultimately, the aim is to use neural models to predict imaging signals, such as fMRI and EEG, which would allow the data obtained by these experimental techniques to be analysed in novel ways. Population dynamics can be simulated by ordinary (Wilson-Cowan) or partial differential equations (PDEs, Fokker-Planck), which are already provided with MIIND. Evaluating PDEs is computationally expensive, and large networks of such populations must be simulated on clusters of workstations. MIIND is very amenable to parallel programming and an MPI implementation would allow the creation of much larger networks. MIIND is a framework that has decoupled the operations of network creation and node simulation as much as possible. Users are able to exchange simulation algorithms from individual nodes at run time. In order to maintain this flexibility the MPI implementation must be implemented in a design, which hides the MPI details from the client code.
Skills
Good C++ design and implementation skills. Knowledge of MPI would be a plus.
Aim
An MPI-supported parallelization of MIIND's central simulation loop.
Alternatively, MIINDs GUI with 3D visualisation capabilities needs a substantial extension.
Skills
Good C++ implementation skills, some experience with QT and OpenGL would be a plus.
Aim
the creation of a much more user-friendly GUI.
Mentored by members of the INCF UK Node (Marc de Kamps and others)
NDF Data Format toolbox support
Most raw data from neurophysiological scientific instruments is encoded in vendor specific (or bespoke) encoding formats. Such formats are often unreadable unless you use vendor specific software or have knowledge of the encoding format. Therefore, there is a need to be able to translate raw data from these systems into a standard data format to allow the standardised design and implementation of service interfaces and interoperable analysis services. To this end, the CARMEN project [1] has developed the Neurophysiology Data translation Format (NDF) [2]. This is a vendor neutral data format that is capable of storing most if not all types of electrophysiology time series and event data. To support the adoption of NDF as a de facto standard for the neuroscience community and for handling data within neuroscience labs, we have developed an NDF Matlab toolbox to allow Matlab code developers (there is a large MatLab programming base in the neurophysiology community) to work with NDF in an intuitive way from within the Matlab environment. The NDF MatLab Toolbox has been implemented on top of the NDF C library API. It consists of a set of object oriented MatLab classes and functions that provide high level support for NDF data I/O. A “multi data formats” to NDF converter is embedded within the toolbox as a data input module. It abstracts away the low level ‘C’ library objects and provides services and functions that are compatible with the MatLab programming environment.
We wish to extend the range of toolbox support options for NDF, such that it can be more widely adopted and deployed in the general neuroscience community. In particular, we wish to add Java, Python, and R programming support, as these are other commonly used programming paradigms for neurophysiology. The project will thus entail building comparable ‘plug-in’ toolbox support libraries for these programming paradigms to allow NDF to be used natively across all of these language sets. The work will underpin global data standardisation activities within INCF, and particularly support the cross collaboration of national data collation exercises led under initiatives such as CARMEN, Neuroscience Information Network (NIF), Collaborative Research in Computational Neuroscience (CRCNS), G-Node and by INCF itself.
Skills
A working knowledge of object oriented programming and JAVA plus knowledge of the Python and R programming languages. Experience with Matlab scientific programming language and experience with developing cross-platform API and library based applications would be highly beneficial.
Aims
-
Develop a Python programming support toolbox for the NDF data standard building upon the C library API
-
Develop an R programming support toolbox for the NDF data standard building upon the C library API;
And, time permitting;
-
Develop a JAVA programming support toolbox for the NDF data standard building upon the C library API
-
Support workshop activities to foster the dissemination of these toolboxes across the neurophysiology community
Mentor: Tom Jackson, CARMEN Project Team
References
[1] The CARMEN Neuroscience Server. Paul Watson, Tom Jackson, Georgios Pitsilis, Frank Gibson, Jim Austin, Martyn Fletcher, Bojian Liang, Phillip Lord. UK e-Science 2007 All Hands Meeting, Nottingham, September 2007.
[2] “The Neurophysiology data Translation Format (NDF) http://www.carmen.org.uk/standards/CarmenDataSpecs.pdf.
INCF iRODS Collaborative Platform
INCF is in the process of setting up a platform for data collaboration and sharing based on the open source iRODS software produced by the Data Intensive Cyber Environment (DICE) group. Within this endeavor we propose to student projects:
Make Web-DAV a first-class citizen within iRODS
This has two potential components. Write an
- iRODS micro-service object that executes the Web-DAV protocol. This will make it possible to register an object from Web-DAV into an iRODS collection
- iRODS driver that mounts a Web-DAV system as a vault. There are some 22 operations that need to be mapped to the iRODS protocol for a full implementation
Expertise will be needed in C programming. The major challenge will be whether there exists a suitable C library for invoking the Web-DAV protocol (might be neon or something on top of it like davfs) or whether this will need to be written/adapted.
Mentored by Raphael Ritz (INCF Secretariat) and Mike Conway (iRODS/DICE)
Authenticate against LDAP
Make iRODS support LDAP as an authentication backend in addition to the default secure password authentication method. Do this by providing a suitable plugin for the forthcoming Pluggable Authentication System of iRODS.
Expertise will be needed in C/C++ programming. Familiarity with LDAP would be a bonus.
Mentored by Ruggero Cucchiani (INCF Secretariat) and Mike Conway and Wayne Schroeder (iRODS/DICE)
Web-based collaborative neural reconstruction with CATMAID
CATMAID, the Collaborative Annotation Toolkit for Massive Amounts of Image Data (see www.catmaid.org), is a web-based platform suitable for the annotation of very large image data sets, such as those produced by serial section transmission electron microscopy.
Reconstruction of neural circuits, also known as connectomics (http://www.ted.com/talks/sebastian_seung.html), requires electron microscopy and produces multi-terabyte datasets. CATMAID is a GoogleMaps-style interface to browse and collaboratively annotate such datasets, see it in action here: http://incf.ini.uzh.ch/catmaid/
We would like to extend the functionality in various ways, foremost in terms of querying and managing reconstructed neurons.
Ideas/Aims
Idea 1: Spatial queries: Drawing outlines on the 2D projection of the 3D datasets and returning all pre- and postsynaptic neurons within that region would be tremendously useful for biologist. This would require building a new Django-view with some user interactivity to draw outlines on 2D images, implementing the spatial queries to the database and presenting the results.
Idea 2: Neuron Catalog: An interface to browse conventiently reconstructed neurons. You see a screenshot on the right. We would like to extend this interface with more attributes to annotate neurons (such as neurotransmitters, cell lineage etc.), integrate the already implemented WebGL-based viewer, and display the circuit with graph visualization and interactivity (see http://www.wormatlas.org/ for inspiration)
Feel also free to propose your own idea (more idea on wishlist: https://github.com/acardona/CATMAID/issues?sort=created&labels=wishlist )!
Skills
The frontend is implemented in HTML5/JavaScript with JQuery & WebGL, and we use Django and a Postgres database in the backend.
Mentored by Stephan Gerhard and the CATMAID project.
Further Information
Some links and email addresses related to GSoC:
- The Google Summer of Code home page: http://www.google-melange.com/
- INCF profile page on the Google Summer of Code 2012 site: http://www.google-melange.com/gsoc/org/google/gsoc2012/incf
- Timeline of Google Summer of Code 2012: http://www.google-melange.com/gsoc/events/google/gsoc2012
- GSoC Frequently Asked Questions: http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2012/faqs
- Email address for informal inquiries: gsoc@incf.org
- INCF's public users list: http://lists.incf.org/mailman/listinfo/incf-users
- INCF's public developers list: http://lists.incf.org/mailman/listinfo/incf-developers
INCF is also prepared to serve as an umbrella mentoring organization for high-profile neuroinformatics projects. If interested, please contact Raphael Ritz.

