Google Summer of Code (GSoC)
Google Summer of Code (GSoC) is a global program focused on bringing more student developers into open source software development. Since 2011, the INCF network has served as a mentoring organization that pairs students with developers from its community to work on 3-month programming projects. Students are paid a stipend by Google. Between 2016 and 2020, INCF paired 87 students with 107 mentors.
Accepted project list
Mentors: Ben fulcher, Joseph Lizier
Psychiatric disorders are diagnosed based on symptom scores from clinical interviews, there are no existing gold standards that can be used for definitive validation. Brain functional neuroimaging techniques including functional magnetic resonance imaging (fMRI), Positron Emission Tomography (PET), and Electroencephalography (EEG) have become important tools in investigating brain disease. Thus, the analysis of functional neuroimaging data can be used to characterize brain function abnormality.
Recently the researchers have formulated different ways to analyze time-series data. Some sophisticated and some simple. Although the simple methods work quite well, there is a need to apply the complex methodologies for analysis. In the paper published by Fulcher et. al, 2017 ( hctsa: A Computational Framework for Automated Time-Series Phenotyping Using Massive Feature Extraction ), showed that numerous features (~7700) can be extracted from a time-series using the proposed hctsa tool. But using the hctsa features for analysis can be computationally expensive and requires closed Matlab licenses to run, limiting widespread adoption for medical and research applications. These features can be reduced to a lesser number of non-redundant features to represent the time-series, using the catch22 feature reduction approach ( catch22: CAnonical Time-series CHaracteristics ). The reduction is performed with minimal loss of the classification accuracy. In this method, the performance of each feature was evaluated on a set of datasets that are very different to NeuroImaging data. Thus, the current work will validate the proposed approach with NeuroImaging data.
This project is particularly important for measurements of brain dynamics of a patient to their disease diagnosis. It further reduces complexities by distilling a large literature on time series analysis into a small subset. It selects only the significant features to represent the time series for analysis, with minimal loss in classification accuracy. The optimized and efficiently coded features will be made into a package for the community to use it. This open source code will let other researchers find the underlying hidden features of NeuroImaging time series and utilize it in their work. The availability and ease of computing the features opens up the application to several areas.
Mentors: Ben fulcher, Joseph Lizier
Time series analysis methods are regularly being developed and we don’t have any resource at present that compares the newly developed method with the methods that already exist which can help the user to conclude the similarity between new method and pre-existing methods. In this project we are going to develop a web-based system that takes an analysis method as python code from a user, computes it with a diverse time-series dataset and analyzes the relation of the newly developed method with the pre-existing one.
Mentors: Jamie Knight, Thomas Nowotny
As opposed to the classic architecture of multilayer Artificial Neural Networks (ANNs), which use feedforward or recurrent learning, Spiking Neural Networks (SNNs) offer the potential to compute using sparse binary signals in a biologically realistic fashion. The implementation of SNNs on neuromorphic hardware can lead to fast, low-power, parallel, event-driven information processing. Unlike ANNs, SNNs compute using spikes - this means that neurons are either spiking or not, instead of having continuous-valued activations. Because such spiking behaviour is non-differentiable, it is more difficult to train SNNs with gradient-based methods that are typically used to train ANNs, such as the backpropagation of error. Hence, there are two major paradigms for SNN training - the first consists of various methods to convert a pre-trained ANN to a SNN with minimum loss of accuracy, and the second tries to derive backpropagation-like learning rules for spike-based computation. Review articles such as Pfeiffer & Pfeil (2018) give excellent overviews of both these research areas.
GeNN is a GPU-enhanced Neuronal Network simulation package in C++, that combines the ease of code generation, primarily for the purpose of setting up the parameters of the simulation through a model definition, with the flexibility of user-defined code, to actually run the simulation and record results. While there are several SNN libraries available, by combining GeNN and standard machine learning packages, it is possible to simulate SNNs and ANNs on the same hardware, thereby providing well-founded comparisons of model performance. This project will primarily use Python (PyGeNN, TensorFlow, Jupyter for tutorials), along with C++ (GeNN).
Mentor: Dimiter Prodanov
The project aims to provide a robust mechanism for cell tracking using 2D raw image objects. Through the use of Mean Square Distance method the potential object displacements will be calculated. Another set of images will be fed as time lapse protocol. Trajectory estimate will be denoised using different modern filters such as IMM,Weiner and Multiple Channel Linear Correlation Filter.
The Active Segmentation platform for ImageJ (ASP/IJ) was developed in the scope of GSOC 2016 - 2018. The plugin provides a general-purpose environment that allows biologists and other domain experts to use transparently state-of-the-art techniques in machine learning to achieve excellent image segmentation. ImageJ is a public domain Java image processing program extensively used in life and material sciences. The program was designed with an open architecture that provides extensibility via plugins.
Cell Tracking has gained importance in recent times due to the growing extensive research in Biology. It has become evident that in order to take full advantage of the potential wealth of information hidden in the data produced by cellular experiments, visual inspection and manual analysis are no longer adequate. To ensure efficiency, consistency, and completeness in data processing and analysis, computational tools are essential. Of particular importance to many modern live-cell imaging experiments is the ability to automatically track and analyze the motion of cell objects in time-lapse microscopy images.
Purpose-The project offers an amalgamation of programming and life science. I had been constantly on the hunt for an opportunity to work in this area and I hope I will be able to deliver the best from my side. Moreover it would be a great learning experience for me and open up opportunities in this domain for further exploration.Its a win-win situation for me.I hope that my interest in this domain will help in realising an intuitive dimension to this project.
Ronaldo Valter Nunes
Mentors: Padraig Gleeson, Ankur Sinha
The project Conversion of large scale cortical models into PyNN/NeuroML involves the conversion of published large scale network models into open, simulator independent and testing them across multiple simulator implementations. In the previous edition of GSOC the large scale network model for the macaque cortex (https://github.com/OpenSourceBrain/MejiasEtAl2016), proposed by Mejias et. al, was successfully converted. In this model, each cortical area is composed of an inferior and a superior layer and the dynamical behavior inside each laminar subcircuit is described by a non-linear firing rate model of Wilson-Cowan type which represents the mean activities of a population of excitatory neurons and a population of inhibitory neurons. A natural extension of this model was proposed in a paper by Joglekar et. al (https://www.ncbi.nlm.nih.gov/pubmed/29576389). Instead of using non-linear firing rate models, the cortical area was simulated as a spiking neuronal network. This was extremely useful to investigate the propagation of activity in the synchronous and asynchronous regime of the network. Although this study was published in 2018, the code is not available in ModelDB. However, it was written in Brian simulator and can be kindly provided by the authors. My goal in this project is to convert this model to PyNN allowing the simulation in several simulators. Besides that, with the firing rate large scale model previously converted it will make possible the full reproducibility of the results published in the paper (https://www.ncbi.nlm.nih.gov/pubmed/29576389). As a secondary goal in this project, I would like to convert the model proposed by Demirtas et. al. (https://doi.org/10.1016/j.neuron.2019.01.017) that is a large-scale circuit model of human cortex incorporating regional heterogeneity in microcircuit properties inferred from magnetic resonance imaging (MRI) for parametrization across the cortical hierarchy and fitting models to resting-state functional connectivity.
Mentor: Dimiter Prodanov
ImageJ is an open source Java image processing and analysis library used extensively in biomedical sciences. Active Segmentation is a plugin providing user interface to scientists, allowing them to use Machine Learning algorithms for segmentation and classification tasks. The aim of the Active Segmentation is to provide researchers an extensible toolbox enabling them to select custom filters and machine learning algorithms for their research. Moreover, it provides the support for scientists without strictly technical background (does not require programming skills to apply above mentioned tools). The idea behind this project is to extend active segmentation with modern deep learning methods for image analysis using Deeplearning4j library.
Mentors: Marcel Stimberg, Dan Goodman
Brian is a spiking neural networks simulator, which provides desirable syntactic sugar and flexibility to allow a wide variety of models without compromising rapidity. With a plethora of different sets of governing equations, neuronal models with complex biophysical properties, synapses with plasticity and parameters, a mechanism to derive generalized platform-independent model description becomes inevitable. Further, the model description mechanism helps in easy access and reproduction of the models, and thereby exasperating reference to the platform-specific source code can be avoided. Currently, Brian uses `brian2tools.nmlexport` package to elegantly export Brian models to NeuroML, with minimal changes. However, this model description mechanism is only confined to the NeuroML/LEMS framework, and cannot be extended easily to various other generalized model descriptors and human-readable formats. Also, the `nmlexport` package is currently limited to specific components and incompatible with key components like Synapses, Network input, etc. Therefore, the project proposes an idea to create a generalized basic framework, which can coherently describe Brian models in a standard format. The standard format shall act as the foundation for exporting Brian models to NeuroML/LEMS format, human-readable like LaTeX typesetting, ModelView description and also shall be flexible enough to extend with various other model descriptors or frameworks. The proposed idea would substantially enhance the interfacing functionality of Brian models with other standard model descriptors and thereby helps numerous users and research communities.
Mentor: Daniele Marinazzo
Blood Oxygen Level Dependent (BOLD) functional Magnetic Resonance Imaging (fMRI) depicts changes in deoxyhemoglobin concentration consequent to task-induced or spontaneous modulation of neural metabolism. An increase in neural activity corresponds to a local increase in oxygenated blood supply. Attributing the contrasting magnetic properties of oxygenated Hemoglobin and deoxygenated Hemoglobin, fMRI measures these alterations in the relative composition of local blood supply. It is non-invasive as it does not employ radiation, making it a virtually zero-risk procedure. It also enjoys relatively low cost, widespread availability, and a good temporal to spatial resolution tradeoff, making it the predominant choice for measuring brain activity .
fMRI is an indirect measure of neural activity. The resulting BOLD signal attributes to the underlying neural activity as well as the Hemodynamic Response Function (HRF). Hence, variability in the HRF can be confused with variability in the neural activity. Several studies have established that HRF varies across subjects as well as across brain regions for a particular subject. This makes it necessary to individually estimate the resting state HRF (rsHRF) across different regions of a brain. An effective methodology for the same has been suggested by Wu et.al. It is based on point process theory and fits a model to retrieve the optimal lag between the events and the HRF onset, as well as the HRF shape. This has been implemented in the rsHRF toolbox.
The Virtual Brain (TVB) is a neuroinformatics platform for the simulation of the dynamics of large-scale brain networks with biologically realistic connectivity. It aims to bridge the gap between the various levels (microscopic, mesoscopic and macroscopic levels) of brain modeling. It is foremost a scientific simulation platform and provides all means necessary to generate, manipulate and visualize connectivity and network dynamics. Its simulation toolkits facilitate the top-down modeling approach to whole-brain dynamics.
TVB also allows simulation of BOLD activity, however, it suffers from the drawback of considering a standard model of HRF across subjects as well as across brain regions of a particular subject. To improve this, we propose to first estimate the rsHRF of all the voxels from fMRI input data, and then average these values over the regions used in TVB. These values can consequently be utilized for simulating BOLD signal in a subject- and region-specific way. This could be a valuable addition to TVB.
Mentor: Christine Rogers
LORIS is an open-source framework that facilitates data sharing for neuroscience labs and sites. It hosts both frontend and backend services that help facilitate data sharing and manipulation among researchers. The codebase includes many modules that perform different functionalities. Therefore, this type of service requires automated testing to ensure that all the moving parts are working correctly.
Both unit and integration tests are necessary for a large project like this to run smoothly so that any bugs can be caught and fixed early and efficiently. This is especially important for projects like this that work with data manipulation and therefore require careful attention to detail. Improving the testing database and the test datasets is also a very important part of this project since the automated tests cannot be relied upon if the datasets being used to test them do not reflect the real world. This will need to be a big focus of the project. Finally, creating documentation that can help future developers and users test LORIS themselves and write their own tests is integral to the continuation of this work.
Mentor: Christine Rogers
Display on multiple browsers and platforms
The use of Reactjs, a component-based framework designed for creating UIs
Given LORIS' scope, there is room for extensibility: particularly for visualization. The objective of this project is to offer end users new visualizations for time-series data. This would allow better interpretation of data shared online through LORIS and paves the path for more comprehensive analytical tools with regards to the data being visualized. This alo requires the data uploaded to meet certain formats or to convert it to certain formats. Along with a visualization tool will be the need to properly validate the uploaded data.
Mentor: Christine Rogers
“LORIS (Longitudinal Online Research and Imaging System), a web-based data and project management software for neuroimaging research studies”
( https://mcin.ca/technology/loris/ ). It is a very convenient tool for researchers conducting neuroimaging research studies, or any clinical studies that involves multiple costly measurements (especially longitudinal studies), are often statistically underpowered because of the difficulty to get data from enough subjects ( e.g. because of the difficulty to recruit subjects, the measurements are very time consuming or subjects are dropping from the study). The obvious solution to increase the datasets’ size is to collect data from multiple sites, but using data with multiple sources needs a very high level of care to make the data collected compatible. Subjects in clinical studies can have a high degree of variability, so every detail must be tracked with caution in an effort to explain this variability. Additionally, multi-center studies involve a high number of people, which alsoincreases variability.
Another major contribution of LORIS is to make data available to researchers that wish to conduct neuroimaging research downstream of data collection. Indeed, neuroscience is very interdisciplinary ( e.g. psychology, medicine, biology, bioinformatics, statistics, machine learning) and not all researchers involved in the process should have to collect their own data to do what they do best. As a computer scientist and neurobiologist, I have been on both sides of this research cycle. From my experience, it is common for researchers to collect their own data to answer a very specific question, when the data from other studies might have been adequate to answer the question if it had it open sourced data. Thus, LORIS should help to reduce useless redundancy of studies, or at least put them together to make better studies.
The REST API is an easy way to securely access, retrieve and manipulate the sensitive data about the subjects stored in LORIS. This data contained in Loris should be easily accessible to the researchers allowed to use it, but such information is very personal to the subjects, so the security of these actions on Loris’ database is of foremost importance for ensuring the subjects confidentiality. The REST API is already a work in progress, so I will be implementing endpoints for the modules not already accessible via the API.
Mentors: Bramsh Qamar Chandio, Eleftherios Garyfallidis, Shreyas Fadnavis
Image registration is the process of finding a transformation that aligns one image to another. DIPY currently supports several numerical optimization-based techniques for image registration. Even though these methods perform well, they are limited by their slow registration speeds. The goal of this project is to develop deep learning-based methods that can achieve image registration in one-shot resulting in much faster registration speeds. In this project, I propose to develop deep neural networks (DNNs) for MRI registration using thin-plate splines, free-form deformations, and affine transformations. I also plan to extend the implementation of thin-plate splines to use cases other than image registration.
Mentors: Bradly Alicea, Vinay Varma Nadimpalli
In any open-source organization, one of the key factors to grow is how many people are involved in it. I have introduced OpenWorm in many conferences and meetups and everyone was interested in contributing to it. Many new members have also joined OpenWorm Slack in recent days.
To remove this barrier I have identified a few major factors listed below.
We have many models, projects but no commonplace to access it.
Social Reach is also not so good.
“People coming from outside find difficulty in knowing what’s going on in the organization” (Anonymous user).
The OpenDevoCell Integration is going to be a great initiative to solve this barrier, it not only helps the other researchers to see and appreciate our work but it would also help to organize this organization functioning in the coming days. OpenDevoCell is going to be the one-stop portal so that everyone from any part of the world can access our work in an effective manner. Many biologists need ML in their work nowadays, but they don’t know much about ML, which is kind of a barrier in their work. We would try to solve these problems for the biologist who has the same research interests as ours.
In this project, we are integrating some projects which are being developed in past year GSoC projects. I am also fortunate to work in the Digital Bacillaria project, so I know in the depth of how that project works and how to implement it in a web portal.
Also, the idea of deploying a python library also is a very good initiative. It will make it even easier to use our models as you can just import our library and get all our data in your python code to do your research.
Vergil (Reuben) Haynes
Mentors: Matteo Cantarelli, Padraig Gleeson
Neuroscience data comes in multiple different data formats and structures. These differences provide a major technical barrier to sharing data between labs or even within labs. Often the organization and naming conventions of neuroscience data structures further obscures how to understand and analyze the data unless already intimately familiar with a specific data structure. The Neurodata Without Borders (NWB 1 ) Initiative provides neurophysiology datasets in a standardized HDF5 format that employs domain knowledge to alleviate the burden of different data formats and structures across multiple experimental paradigms. In addition, the NWB Initiative provides tools for handling, visualizing and analyzing NWB formatted data.
This proposed project aims to contribute to NWB Showcase made available at NWB Explorer 2 on the Open Source Brain repository 3 . The proposed project will deliver multiple converted datasets to be viewed at the NWB Explorer and will integrate tutorials and analysis examples for select converted datasets.
Mentor: Bradly Alicea
Recent advances in the field of computer vision and deep learning has shown great promise in their ability to decipher images and derive inferences involving classifications, detection of objects and approximation of certain values with high accuracy.
Pre-trained models like YOLONet and ResNet are now being used in various industries where they help make our lives easier. But these kinds of models are not yet being used for microscopic images on a large scale. With the right model architecture and training approaches, it is possible to get pre-trained models which would help in the research efforts of many. These pre-trained models, combined with a GUI would act as a community tool which would help speed up the classification of thousands of microscopic images and gain inferences from them.
The top priorities of this proposal are:
Train a deep learning model(s) from the image dataset(s) provided.
In the process of training, develop a data augmentation pipeline which can be used on the cellular image datasets (even on the cellular images which are not involved in this project) to help build a model robust enough for its purpose.
Make the trained model portable so that it can be easily integrated into a GUI backend.
Mentors: Lukas Vareka, Roman Mouček
The Workflow Designer is a prototype web-based application allowing drag-and-drop creating, editing, and running workflows from a predefined library of methods. Moreover, any workflow can be exported or imported in JSON format to ensure reusability and local execution of exported JSON configurations. The application is primarily focused on electroencephalographic signal processing and deep learning workflows.
Currently, the entire Workflow Designer system (server, workflow system and methods) is based on Java. The aim of this project is to transfer backend technologies from Java to Python and allow executing workflow blocks (methods) implemented in Python, using e.g. MNE for EEG signal processing, or TensorFlow for deep learning. Just like in the current version, each block has inputs, outputs (can be streams, arrays, files, etc.) and parameters that can be configured using a GUI. After the system is transformed, develop a few deep learning workflow-related blocks to demonstrate the functionality of the system.
The objectives I wish to achieve for EEG and DL workflow are:
Rewrite all the models/algorithms in python. I.e. re-writing:
Neural network models and classifiers in python using Keras (TensorFlow).
Preprocessing, Low/High pass filter, epoch extraction, averaging filter, etc.
Feature extraction algorithms (wavelet transform)
Data visualization algorithms
Rewriting the server in python(Flask):
Blocks and their functional APIs
Drag and drop UI
New feature development and bug fixing
Extra deep learning models
Mohammad Asif Hashmi
Mentors: Jordi Huguet, Greg Operto
Currently, XNAT comes with a native built-in GUI. This project will provide a dashboard framework to allow users to easily develop responsive dashboards, for exploring, monitoring, and reviewing datasets stored on any XNAT instance.
It will interact with the XNAT server instance and get the required data from it, using that data, we will visualize the information present in it in a summarized form.
It will be designed so that it can be used with any XNAT instance.
This project will create a flexible dashboard framework which can further improve and add new features as per the changing requirements or needs of the user.
Syed Hussain Ather
Mentor: Rick G, JohnGrif
This project is about unit tests for brains with SciUnit. We want to evaluate the strength of various models and select them among competitors for analyzing brain data.
Mentors: Paula Popa, Lia Domide
● To create a basic GUI interface for the reconstruction pipeline which gives the ability to users to provide input data, choose configurations, identify the outputs, and check logs in case of any problem which occurs during the whole process.
● To integrate the GUI with our Pegasus workflow engine for automation, fault-tolerance and debugging and to provide the job status and execution statistics.
● Implement GUI automated testing.
● To implement more functionality for the GUI at a higher level of abstraction.
Mentor: Rick G, JohnGrif
The SciUnit framework was developed to help researchers create unit tests for scientific models. Currently, unit tests exist for models of single neurons and small networks thereof. However, unit tests for models concerned with large-scale brain network dynamics, such as meso-scale, mean-field descriptions and corticothalamic circuit models have not been developed yet. During the summer, I plan to develop tests to validate predictions from a corticothalamic neural mass model against relevant features of EEG data.
Mentors: Paula Popa, Lia Domide
The project is about upgrading and fixing tvb_geodesic library which is used to calculate geodesic distance on cortical surfaces.