Interview with Falk Lüsebrink, creator of a unique, 10-year BIDS data set21 June 2021
"Publicly available data is at the heart of open science"
In 2017, Falk Lüsebrink published a human whole brain in vivo MRI dataset with the highest spatial resolution to date (250 μm). Then he had an idea: since he knew the person scanned had participated in several other studies, why not collect and share them all?
Four years later, he has assembled a 10 year long dataset unique in several ways: i. the same person in the same scanner with the same software used, ii. with a 7T MRI scanner (rarely available equipment), and iii. in a uniquely fine grained resolution thanks to the scanner’s strong magnetic field and careful motion correction. The data descriptor is published in Scientific Data.
Lüsebrink learned of BIDS in 2016, due to an editorial request from Scientific data, who wanted the submitted data of the first dataset to be BIDS formatted. I must admit that I had not heard of BIDS before, he says. Although it was not a free decision from the beginning, I saw a lot of potential in BIDS; and, therefore, it was worth the effort structuring the data accordingly.
A former colleague, Michael Hanke, one of the BIDS contributors, was able to help with extracting meta information from the original DICOM data. Hanke is also one of two main contributors to Datalad, a data distribution and management platform which was used to assemble all the data for the final dataset.
Given its uniqueness, Falk Lüsebrink sees many possible uses and users for the data set: We expect this new data set to be used in many multimodal processing schemes, e.g. data fusion for visualization as well as teaching, building of brain atlases, vascularization of subcortical structures such as the hippocampus, validation of connectivity models based on joint DTI and rs-fMRI data and many things beyond, he says. Due to the ultrahigh resolution and high quality, they expect to see structures that were never identified before in vivo.
He also sees many possibilities for the dataset to contribute to technical and methodological advances: Since the scanning protocol is not identical for every measurement, this could be an ideal test bed for novel data harmonization algorithms to avoid potential bias in data assessment. Furthermore, data is included in back to back studies within the very same session and with artificially reduced SNR. This may allow for test-retest studies in software development and to validate denoising algorithms as ground truth data can be generated. Beyond that, the scanner's raw data of the back to back scans are available which allow for development and benchmarking of advanced reconstruction algorithms.
Falk Lüsebrink estimates that compiling the data and acquiring more data to have a comprehensive data set took around one year of work, in between other projects. Processing the data and structuring the results according to BIDS took most of the time. Well worth the time and effort, he thinks: Publicly available data is at the heart of open science. We think many scientists can advance their respective field of research by using these data. Something we would never be able to do ourselves.The relevance of freely available data has once again been underlined by the current pandemic and the associated lockdowns of MR laboratories, among others.
The Human Phantom dataset in detail