The purpose of this document is to solicit community feedback on the SPARC Data Structure (SDS) that was submitted to INCF for endorsement as a standard. The document contains the INCF standards and best practices committee's review of SDS, and the criteria in which it was evaluated (open, FAIR, testing and implementation, governance, adoption and use, stability and support, extensibility and comparison to similar standards). For the next 60 days, we are seeking community feedback on SDS.
About SPARC:
The SPARC data structure is a consistent file structure and naming convention, based on the Brain Imaging Data Structure (BIDS) to ensure that the diverse types of data in SPARC is organized in a similar manner. The current version is SDS 2.0 (released June 24, 2022).
Summary of Discussion:
Overall, the members of the INCF Standards and Best Practices Committee could potentially meet the criteria for INCF endorsement. It is open, has strong documentation, and supports FAIR reasonably well with evidence of efforts to align it with BIDS and the DANDI metadata structure. Its use is currently imposed on the SPARC community of approximately 500 investigators with no evidence of use outside of this community. While SDS is inspired by BIDS, it was designed explicitly to accommodate data collection patterns that are fundamentally incompatible with BIDS 1.0 structures (the 20% that BIDS does not attempt to cover). Since SDS has been a consortium used standard, it currently lacks a formal governance structure; however, the submitters have indicated that once SDS is used by groups outside of the consortium that a formal governance structure will be established.
Recommendation:
The INCF Standards and Best Practices Committee voted to put SDS forward for community review. The committee is particularly interested in receiving comments from the BIDS and DANDI metadata structure communities. In addition, the committee would also like for SDS, BIDS, and DANDI to draft a commentary on the relationship between the standards to better help the community in determining which standard to use.
No competing interests were disclosed
Comments
Thu, 09/14/2023 - 19:22
Thu, 09/14/2023 - 19:41
I lead the development of SODA, a software that simplifies the process of structuring datasets according to the SDS.
Thu, 09/14/2023 - 20:17
We are developing a novel visualization tool to browse SDS files and quickly identify any errors in their structure or missing information during the curation process to ensure only the highest quality, FAIR data is submitted to repositories using this standard.
Fri, 09/15/2023 - 05:58
As the technical lead for the SPARC Data and Resource Center’s MAP Core, I have been responsible for coordinating development of software tooling for mapping SPARC data to anatomical scaffolds and for developing many aspects exploration of SPARC data on the SPARC Portal (https://sparc.science). Our endeavours have been significantly enabled by the adoption of the SDS. Our mapping and portal visual exploration tools rely on the rich structured annotations that describe the data contained in SDS datasets.
We have been fortunate to work directly with the developers of the SDS, contributing feature requests and questions needed to support the range of data and knowledge that we need to to enable the mapping and data registration workflows as well as the visual exploration on the SPARC Portal. Furthermore, we have pushed the limits of the SDS in adopting this format to store computational modelling data - for example, anatomical organ scaffolds (finite element models) with external data embedded in the scaffold; or compartmental models with associated simulation experiments that can be interactively executed across SPARC resources. Maintaining unambiguous provenance in these datasets which can be surfaced on the SPARC Portal in a manner enabling users to understand what they are looking at and where it came from. Accurate and comprehensive attribution and citation of datasets is a crucial aspect of FAIR data platforms and tools.
Beyond SPARC, I have been able to take learnings from our exposure to the SDS and apply them to other standardisation efforts I am involved in. In particular, the structured metadata (with recommended terminologies/ontologies to use) has proven very powerful as we look to link computational models to experimental and clinical data.
As noted in the submission, one key omission for a community standard like this is the specification of a governance framework. Going forward, I believe that broader adoption of the SDS will require establishing a formal governance structure and clearly defined process by which decisions on changes to the specification are made.
I am a member of the SPARC Data and Resource Center.
Fri, 09/15/2023 - 20:36
A key component of this platform involves storing data using the SPARC Dataset Structure (SDS). This has provided a robust mechanism to standardise our data management practices and maximise the reuse and impact of data generated from our research. Over 30 researchers who are part of 12 LABOURS exemplar projects have started to store their data in SDS format in diverse computational physiology applications, including the development of novel biomarkers for pulmonary hypertension, rehabilitation of upper limb disorders, control of organ function by the autonomic nervous system in the uterus and stomach, and supporting breast cancer diagnosis and treatment.
Despite being investigators outside the SPARC community, we have found the SPARC community and the developers of the SDS to be very supportive of our needs and have always been open to feedback. For example, they have helped us build tools that enable programmatically creating SDS datasets (https://github.com/SPARC-FAIR-Codeathon/sparc-me) and workflow descriptions (https://github.com/SPARC-FAIR-Codeathon/sparc-flow), which we are applying in our platform to maximise reuse of research outcomes and support reproducible science (https://github.com/ABI-CTT-Group/digitaltwins-api).
We strongly support the SDS becoming an INCF-endorsed standard due to the demonstrated benefits we've experienced using it and the exceptional support provided by its developers. As active members of the broader research community, we look forward to contributing to the SDS development roadmap. We also plan to adopt the SDS across our institute of over 300 researchers (https://www.auckland.ac.nz/en/abi/our-research/research-groups-themes.html). We are also starting to work towards extending the application of the SDS as part of the 12 LABOURS data catalogue in the New Zealand Government funded Medtech-iQ Aotearoa initiative (https://www.cmdt.org.nz/medtech-iq-aotearoa) - New Zealand's national innovation hub for medical devices and digital health technologies. This data catalogue will store FAIR descriptions of diverse data in SDS format that hundreds of researchers across New Zealand will contribute.
None