Skip to main content
x

Google Season of Docs 2020 Project Ideas List

 

The Turing Way: A how to guide to data science

Contact: Malvika Sharan (msharan@turing.ac.uk)

Description: The Turing Way is an open-source, community-led and collaboratively developed book project on making data research accessible for a wider research community (https://the-turing- way.netlify.com). We bring together individuals from diverse fields and expertise to develop practices and learning resources that can make data research comprehensible and useful for everyone. The project is openly developed and all questions, comments, recommendations and discussion are facilitated through an online GitHub repository

Our community members are researchers, engineers, data librarians, industry professionals, and experts in various domains, at all levels of seniority, from all around the world. Since the project's launch in April 2019, they have co-authored chapters on research reproducibility by compiling best practices, tools and recommendations used by the scientific communities worldwide. 

Since the project has grown from a small team of 10 researchers to a community of 125 contributors, the scope of The Turing Way book has also expanded to include different aspects such as project design, collaboration, communication, and ethics in data research (preview, pull request #977). The project aims to provide learning and training resources on a wide array of topics and builds on case studies provided by Turing researchers, impact stories contributed by individuals, and workshops delivered by the team members. 

For this project, the Google Season of Docs technical writer will review the existing chapters and provide editorial support on incorporating new content, such as the chapters on distributed collaboration, which are currently under development. Based on their personal preference, they will have the possibility to contribute to one or more newly proposed sections of the book by developing resources (in English) such as new chapters, interactive tutorials, impact stories, and case studies with the help of our community members. If they have a good understanding of Chinese, Hindi or Spanish, they can also choose to translate the existing materials in one of these languages. They will be provided with appropriate guidance and opportunity to work in a positive working environment. They will be fairly acknowledged for their contributions to the project. The final goal will be to identify opportunities to enhance the quality of the book by improving its language and structure that can make The Turing Way accessible for graduate students, research software engineers, senior investigators and administrative teams. 

Optional technical reading: The Turing Way is hosted as a Jupyter Book at https://the-turing- way.netlify.com. Jupyter Books can format markdown files and Jupyter notebooks as static HTML making them easy to read. When a notebook is included in the book, the static page includes a link to an interactive version of the notebook via Binder. We are upgrading to the latest version of Jupyter book and intend to integrate more interactive features, which are currently underutilised. The translation efforts are supported through Transifex

Recommended skills: 

• Writing and editing skills in the English language 

• Experience working in distributed communities, ideally using git and GitHub 

• Experience collaborating on data science or quantitative research projects at any level 

Optional skills (we understand that technical writers may not have all these skills)

• Interactive visualisation 

• Data science ethics 

• Community building 

• Translating English content into Chinese, Hindi or Spanish 

Potentialoutcome: 

An enhanced version of The Turing Way’s existing chapters, editorial support for new contributions by the community members and resource development for one of the newly proposed sections of the book. 

Difficulty level: Medium if the candidate has prior experience of working with git and GitHub. However, resources and guidance will be provided by the mentors. 

Links to project: https://the-turing- way.netlify.com

 

 

LORIS training docs for open reproducible neuroscience

Contact: Christine Rogers christine.rogers@mcgill.ca

Description:LORIS is an online database for large research studies and neuroscience data collections, often used in Open-Science projects and areas such as Autism and Alzheimer’s. LORIS provides researchers with user-friendly workflows for collecting, curating and sharing many different kinds of data: including imaging, clinical, electrophysiological and genetic data. How we work: Our diverse team of 20+ developers is based at the Montreal Neurological Institute (McGill University, Canada), and works together on our codebase in our GitHub repo. We are always looking to improve and streamline our different kinds of documentation through GitHub. We use the git command-line extensively to work on features, issues and branches for each release. 

Required skills: basic knowledge of GitHub, markdown/html, ability to learn git. Any background in science, research, or databases is an asset. Good English language communication. 

Potentialoutcomes: The writer will have worked interactively with mentors to improve and update any 2 of the following: 

  • A: Review, update and improve our installation/setup documentation, and fill gaps in end-user materials providing training for new users
  • B: Help migrate and update documentation from our GitHub wiki to Read-the-Docs, via versioned Markdown files in the codebase
  • C: Help generate a database diagram to communicate data relationships (e.g. subject, study visit) and improve architecture visualization.
  • D: Help update and improve content introducing people to LORIS, its open science context, and its use cases.  Skills: git and GitHub, Basic html (for Reveal.js) and/or video editing

Links to project: website: LORIS.ca GitHub: github.com/aces/Loris

 

 

InterMine user training docs

Contact: Yo Yehudi (yo@intermine.org)

Description: InterMine is an open source biological data warehouse, designed to help biologists analyse and query data. There is a fragmented set of videos and tutorials at intermine.org/tutorials, and a more complete set of exercises and tutorials at https://flymine.readthedocs.io/en/latest/. These tutorials focus on InterMine’s legacy user interface, which will soon be replaced by BlueGenes, a more modern interface. As such, all tutorials will need to be re-created for the new user interface - ideally these tutorials will be a mixture of text, screenshots, and short videos with captions and voice-over.

Tutorials will be embedded in the InterMine website (source code: https://github.com/intermine/intermine-homepage-2017, live: intermine.org). 

Required skills: 

  • Good english language writing skills. 
  • Please note - technical and biological knowledge is not required for this project.

Potentialoutcome: An updated set of tutorials and/or training infrastructure for InterMine.

Difficulty level: Medium if you know Git, Hugo and understand InterMine queries. Harder otherwise, although we will be able to offer assistance and training for these areas. 

Link to project: intermine.org

InterMine: Review, update, and integrate InterMine developer documentation

Contact: Yo Yehudi (yo@intermine.org)

Description: InterMine is an open source biological data warehouse, designed to help biologists analyse and query data. Technical documentation is spread across multiple locations, with a core documentation set at https://intermine.readthedocs.io/ (source code: https://github.com/intermine/im-docs), and dev documentation for the new user interface located at https://github.com/intermine/bluegenes/tree/dev/docs. Significant transformative changes have been made to the code base in the timespan since the original core documentation was written, and while we’ve updated it, a “friction test” review of the introductory documentation, tutorials, and customisation guide would probably identify documentation gaps and re-organisation requirements.

The documentation is written in a mixture of reStructuredText and Markdown. For this project we would like a technical writer to review our existing documentation, improve and fix gaps, reorganise the documentation if necessary, and potentially port reStructuredText formatted docs into Markdown. 

Required skills: 

  • Good english language writing skills. 
  • Please note - technical and biological knowledge is not required for this project, although we will be able to offer assistance and training for these areas. 

Potentialoutcome: An updated set of tutorials and/or training infrastructure for InterMine.

Difficulty level: Medium if you are familiar with Markdown, reStructuredText, and Git, harder otherwise. 

Link to project: intermine.org

 

 

DevoWorm: documentation infrastructure

Contact: Bradley Alicea (balicea@openworm.org)

Description: The OpenWorm Foundation is dedicated to creating the world’s first digital organism in an Open-Source/Open Science manner. The DevoWorm group (http://devoworm.weebly.com) is an affiliated research unit advancing research in Computational Developmental Biology, Data Science, and Machine Learning. We would like to develop documentation infrastructure in the form of a secure workspace ecosystem that will allow us to

encourage community standards and analytical reproducibility. This project will build upon the work that will be advanced during this year’s Google Summer of Code, which will result in a user front-end for our ML/AI education and research efforts. A documentation infrastructure sitting below the hood of this front-end will aid us in our mission of offering integrated instruction and research engagement. The successful applicant will attend our group meetings and interact with a diverse community of collaborators. 

Potentialoutcomes:

1) A secure workspace for data storage and manipulation. Users should be able to access data from our organizational collection (DevoZoo) as well as submit new sources of data for inclusion into the collection. We would also like for there to be some sort of interface with various social media channels (Twitter, ResearchGate, Github, Slack) to share papers and insights as necessary to enable group learning and research insight.

2) An interface to present analysis options to users of varying skill levels. This includes building upon an interface that guides users (often with limited expertise in ML/AI) through the relevant assumptions and overall suitability of different algorithms and classes of analysis for their particular problem. The interface should be documentation-rich, meaning that users should be able to access documents related to community-standards and further information about each type of analytical tool in our library. Accompanying this will be the development of a computational notebook template allowing users to access Wiki resources and share lab notes across analyses.

Link to project: http://devoworm.weebly.com

 

 

Orthogonal Research and Education Laboratory

Contact: Bradley Alicea (bradly.alicea@outlook.com)

Description: Building upon work being conducted in the Orthogonal Research and Education Laboratory on Epistemological Directries (EDs), we seek to build a set of directories, annotated bibliographies, and graphical representations that define the scope of field-specific knowledge. The Epistemological Directory (ED) is an innovation of our lab recently presented at csv,conf, and provides a means to educate contributors on scholarly topics where they have deficiencies in expertise. With Season of Docs support, we will be able to develop EDs into a truly comprehensive referential system for new contributors to get up to speed on unfamiliar topics.

Our plan is to systematize and expand the current framework of EDs. Our revised version of the ED concept will unify a number of topics associated with the Orthogonal Lab’s research portfolio. These topics include: Artificial Intelligence, Neuroscience, Cognitive Science, Simulation, Philosophy of Science, and Open Science. The basic structure of an ED includes documentation (tutorials, links, discussions, and bibliography) for specific topics, visualizations that define the development of a field, key equations and methodologies, archives of data sets and key findings, and linkages between distinct fields in the form of potential connections and modes of investigation. It is this last issue that is potentially the most useful to our organization.

Potentialoutcomes: The priorities for this Season of Docs would be as follows: start development on a new Laboratory-wide ED, make linkages between issues and fields in a few of our existing EDs, and (if time permits) contribute to a set of community standards for implementing EDs on other topics and in other organizations.

Link to project: https://data-reuse.weebly.com/

 

 

Pyradigm: research data management for medical data

Contact: Pradeep Reddy Raamana (praamana@research.baycrest.org

Description: Pyradigm is an open source data structure for biomedical data and features, designed to link multiple tables of mixed data types to improve dataset integrity and ease of use. Learn more here

Documentation for the older version was published at https://pyradigm.readthedocs.io. Latest documentation, expected outcome for this project will be published at : http://raamana.github.io/pyradigm/. This latest version introduces much improved underlying design, along with a substantial set of new derived classes. There have been some changes in behaviour along with design changes over the full course of project timespa, since the original core documentation was written. Hence, it would be useful for the technical writer to be rigorous in identifying gaps in documentation, tutorials, and guides, and fill them as necessary.

The current documentation is a mix of reStructuredText and Jupyter notebooks (Python and Markdown). We would like the technical writer to review our existing documentation, improve and fix gaps, reorganise the documentation if necessary. They would have great freedom in choosing the technology stack in their efforts, so long as they achieve the primary objectives.

Required skills: 

  • Good english language writing skills
  • Some basic understanding of machine learning
  • Importance of research data management
  • Basics of python and object oriented programming

Potentialoutcome: 

  • Much more accessible documentation in terms of appealing to broader and lay audience
  • Many tutorials covering the base classes, their properties as well as multiple derived classes targeting advanced use cases
  • Many more example jupyter notebooks to demo usage

Difficulty level: Medium if you are familiar with Markdown, reStructuredText, and Git, harder otherwise. 

Link to Project: https://crossinvalidation.com/2020/04/29/research-data-management-for-medical-data-with-pyradigm/

 

 

Anatomical Fiducials (AFIDs) Framework Documentation

Contact: Jonathan Lau (jonathan.c.lau@gmail.com)

Description: Advancements in brain imaging have resulted in novel insights into brain function in healthy and diseased individuals, but these advancements have been associated with increasingly complex processing and analysis methods. The anatomical fiducials (AFIDs) project aims to provide a standard way of describing these insights using an open standard. The project has been validated for use in human brain scans (see related manuscript) with ongoing community driven efforts to extend the project to other species, for new anatomical features, and for use in research and clinical settings. To date, over 100 individuals have been trained. The AFIDs protocol has been effective for quality control related to brain image processing (evaluating correspondence) and teaching neuroanatomy.

AFIDs has been developed with FAIR (findability, accessibility, interoperability, and reuse of digital assets) principles in mind for scientific data management and stewardship although fulfilling the requirements remains a work in progress. Specifically, the project is built on open software (3D Slicer), open data, and open data standards based on the premise that any individual with a computer can learn brain anatomy and contribute to the AFIDs ecosystem. We feel that improving the core documentation will play a crucial role in encouraging individuals to use this framework to learn anatomy and adopt AFIDs for quality control and teaching purposes, as well as for crowdsourcing annotations of new datasets.

Potential outcomes:

  • Improving existing documentation, currently hosted on Github, to lower the learning curve and encourage more individuals to learn the protocol and contribute.
  • Advance the educational curriculum by creating documentation for different species, new anatomical points, and new datasets, including developing separate documentation for novices that are more guided and prescriptive on how to place AFIDs, in comparison to the existing documentation.
  • Collaborate with software developers, engineers, clinicians, and researchers involved in different AFIDs-related projects including the development of new workflows, clinical integration, automation, quality control, and extension to other species.
  • Design and develop new specifications for teaching brain anatomy to complete novices, which will be validated through assistance from other team members.
  • In addition to writing, there will be opportunities to create or assist with developing figures, infographics, video tutorials that provide alternative means of teaching the protocol.

Link to project: website: http://www.afidsproject.com Github: http://www.github.com/afids/

 

 

Blue Brain Nexus

Contact: Samuel Kerrien (samuel.kerrien@epfl.ch

Description: The EPFL Blue Brain Project (BBP), situated on the Campus Biotech in Geneva, Switzerland, applies advanced neuroinformatics, data analytics, high-performance computing and simulation-based approaches to the challenge of understanding the structure and function of the mammalian brain in health and disease. The BBP provides the community with regular releases of data, models and tools to accelerate neuroscience discovery and clinical translation through open science and global collaboration.

One of the big challenges faced by neuroscientists today is to ensure discoverability, and reproducibility of their data through appropriate data management. Core aspects underlying such adequate data management were defined as the FAIR Guiding Principles: Data should be findable, accessible, interoperable, and reusable both by machines and scientists (doi: 10.1038/sdata.2016.18). Thus far, few tools are available within the field of neuroscience that help implement the FAIR Guiding Principles for data management.

Blue Brain Nexus is an open-source, domain-independent, provenance-based semantic data management platform supporting the FAIR Guiding Principles. Through a knowledge graph, Blue Brain Nexus enables the description of a domain of application for which there is a need to create and manage entities, store and manage their provenance and relate them. Capturing provenance of neuroscience data together with rich metadata helps making them reusable, reproducible and interoperable. Schemas developed using the Shapes Constraint Language (SHACL) help ensure the validity and quality of the data and provenance provided. Blue Brain Nexus was used to create and manage a data model (schemas, vocabularies, provenance templates) to standardize the representation of the data entities and processing activities for single-cell slice electrophysiology, morphology reconstruction and brain atlasing.

The Blue Brain Nexus platform allows scientists to:

  • Register and manage neuroscience-relevant entity types through schemas that can reuse or extend community-defined schemas such as schema.org and ontologies of e.g. taxonomies or cell types.
  • Describe the provenance of data using the W3C-PROV model.
  • Search, discover and reuse high-quality neuroscience data to facilitate further research. 

Blue Brain Nexus has been adopted by some highly visible international neuroinformatics projects:

  • The Human Brain Project (HBP, https://www.humanbrainproject.eu/en) is a European flagship project aiming to understand the human brain and to translate neuroscience knowledge into medicine and technology. To do so, it aims to build a collaborative information and communications technology based scientific research infrastructure to allow a transdisciplinary community of researchers from over 130 institutions across Europe to share data and knowledge in the field of neuroscience, computing and brain-related medicine. The HBP uses Blue Brain Nexus as a key foundation component of their Neuroinformatics platform (kg.humanbrainproject.eu).
  • The Canadian Open Neuroscience Platform (CONP, https://conp.ca/) is a key national project for the Canadian INCF node that aims to bring together many leading scientists in basic and clinical neuroscience to form an interactive network of collaborations in brain research, interdisciplinary student training, international partnerships, clinical translation and open publishing. The platform will provide a unified interface to the research community and will propel Canadian neuroscience research into a new era of open neuroscience research with the sharing of both data and methods, the creation of large-scale databases, the development of standards for sharing, the facilitation of advanced analytic strategies, the open dissemination to the global community of both neuroscience data and methods, and the establishment of training programs for the next generation of computational neuroscience researchers. CONP is integrating Blue Brain Nexus as a search engine for their portal (portal.conp.ca).

Potential outcomes:

  • Write a user friendly introduction to Knowledge Graphs underlining their value, how they compare to other means of organizing data (e.g. RDBMS) and how to build Knowledge Graph.
  • Write a user friendly introduction to Linked Data with a dual focus on lay users (the gist of the method) as well as developers (more technical details). This should cover the value of this method as well as the underlying technologies (e.g. RDF, Triple Stores, SPARQL, SHACL, Ontologies, ...) that are typically used to implement it.
  • Refactor and expand the existing open source documentation to make it more useful to end users and emphasize the various parts of the Blue Brain Nexus product:
    • Nexus Core - backend technology that allows users to build and leverage knowledge graphs.
    • Nexus Studio - user interface that enables users to easily leverage the information organized into managed knowledge graphs.
    • Nexus Forge - library and application to facilitate the building of knowledge graphs.
  • Refactor and expand the existing Blue Brain Nexus Tutorial:
    • Nexus Core
    • Nexus studio
    • Nexus Forge

Links to project: Source code: https://github.com/BlueBrain/nexus, Documentation: https://bluebrainnexus.io/docs/, Tutorial: https://bluebrainnexus.io/docs/tutorial/index.html