0657

fastMRI: a publicly available raw k-space dataset for accelerated MRI reconstruction using machine learning
Florian Knoll1, Matthew Muckley1, Jure Zbontar2, Anuroop Sriram2, Aaron Defazio2, Michal Drozdzal 2, Krzysztof Geras1, Mary Bruno1, Marc Parente1, Nafissa Yakubova2, Mike Rabbat2, Adriana Romero Soriano2, Pascal Vincent2, Erich Owens2, Joe Katsnelson3, Hersh Chandarana1, Yvonne W Lui1, Daniel K Sodickson1, Larry Zitnick2, and Michael P Recht1

1Center for Advanced Imaging Innovation and Research, Department of Radiology, New York University School of Medicine, New York, NY, United States, 2Facebook Artificial Intelligence Research, Menlo Park, CA, United States, 3Medical Center IT, New York University School of Medicine, New York, NY, United States

Synopsis

Despite the substantial increase in research activity in machine learning for MR image reconstruction, no large scale raw k-space data set is publicly available. This makes it challenging to reproduce and validate comparisons of different approaches, and it restricts access to work on this problem to researchers associated with large academic medical centers. This abstract introduces the first large-scale database of MRI data for reconstruction. The database currently includes about 7500 raw MRI k-space data sets from a range of MRI systems and clinical patient populations, with corresponding images derived from the rawdata using reference image reconstruction algorithms. Approximately 30000 additional clinical image datasets not directly associated with the rawdata are also included, and we plan to add to the database over time.

Introduction

Since 2016, there has been a substantial increase in research activity in machine learning for MR image reconstruction1,2,3,4,5,6. However, the field still lacks a large-scale public dataset for raw k-space data. In classic fields of machine learning large public data sets are routinely used for annual competitions and benchmarking. In contrast, MR image reconstruction studies are generally trained and validated on small individual datasets compiled by the authors, which in many cases are not shared with the research community. This makes it challenging to reproduce and validate comparisons of different approaches, and it restricts access to work on this problem to researchers associated with large centers where such data is available. The goal of the fastMRI dataset is to provide a first step towards a solution of this issue. Here we describe our recent release of the first large-scale database of MRI rawdata, including approximately 7500 raw MRI measurement data sets from a range of MRI systems and clinical patient populations, with corresponding images derived from the rawdata using reference image reconstruction algorithms. Approximately 30000 additional clinical image data sets not directly associated with rawdata are also included. We plan to add to the database over time.

Description of the dataset

Raw k-space dataset: Fully sampled rawdata of consecutive patients undergoing regular clinical exams of the knee (≈1600), the brain (≈5300) and the liver (≈800) were collected. The study was approved the IRB. Patients were screened for metallic implants or other safety concerns, following routine safety procedures at our institution. Otherwise, there were no specific exclusion criteria. Scans were performed on five clinical 3T systems (Siemens Magnetom Skyra, Prisma, Trio, Vida and Biograph-mMR) and one clinical 1.5T system (Siemens Magnetom Aera) with clinically used receive coils. We used Cartesian 2D-TSE and GRE protocols that are employed clinically at our institution. Sequence parameters were matched as closely as possible between the different systems. Since our goal was to provide fully sampled k-space data, we disabled all subsampling-based acceleration methods like parallel imaging and partial Fourier for this study. Rawdata were exported from the scanners, anonymized, and converted into the vendor-neutral ISMRMD format7. The dataset includes acquisitions from five protocols:

  1. Knee: Coronal proton-density-weighting with fat suppression.
  2. Knee: Coronal proton-density-weighting.
  3. Brain: Axial T2-weighting.
  4. Brain: Axial T1-weighting.
  5. Liver: Axial T2-weighting with fat suppression.

Example images from reference reconstructions are shown in Figure 1, and detailed lists of the data acquisition parameters are contained in Tables 1-3.

Dicom dataset: In addition to the scanner rawdata, our dataset currently includes image sets from 10.000 knee, 10.000 brain and 10.000 liver scans of consecutive patients undergoing regular clinical exams. This data comes from a variety of scanners within our institution and includes images from sequences beyond what is included in the raw dataset. Reconstructed DICOM images were anonymized using the RSNA clinical trial processor. In addition, we performed manual inspection of each DICOM image (and rawdata file) for the presence of unexpected protected health information (PHI).

We will provide links to the dataset by the time of presentation at the annual meeting.

Discussion

To our knowledge this is the first large-scale public dataset of raw k-space data from a clinical patient population. While public datasets do exist for reconstructed images, for example the Human Connectome project (HCP), the Alzheimer’s Disease Neuroimaging Initiative (ADNI) or the Osteoarthritis Initiative (OAI), they are generally specialized by already targeting a specific research question, where imaging serves as a tool to answer this particular question. Our dataset is broader, with the goal of providing a resource to improve image acquisition and reconstruction itself.

The number of cases that are included as DICOM images is substantially larger than the core k-space data, and this part of the dataset is more heterogeneous with data coming from a wider range of MR-systems and protocols. It is worth noting that a Fourier transform of these images does not directly correspond to the originally measured rawdata. Images were also partly acquired with accelerated acquisitions and reconstructed with parallel imaging, which additionally confounds the validity of them being used as a fully sampled ground truth. In the context of machine learning for image reconstruction, our motivation to include the DICOM data is to answer the question if training on a larger number of less perfect data outperforms training on a smaller number of high quality data in terms of performance and generalization.

We hope that the availability of this dataset can further accelerate research in MR image reconstruction, much as computer vision was supercharged by well curated large-scale datasets like ImageNet8. In particular, we hope that this dataset can serve as a benchmark during training and validation of developments in image reconstruction.

Acknowledgements

We acknowledge grant support from the National Institutes of Health under grants NIH R01 EB024532 and NIH P41 EB017183.

References

[1] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a Variational Network for Reconstruction of Accelerated MRI Data,” Magn. Reson. Med., 79:3055–3071 (2018).

[2] S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang, D. Feng, and D. Liang, “Accelerating Magnetic Resonance Imaging Via Deep Learning,” in IEEE International Symposium on Biomedical Imaging (ISBI), 514–517 (2016).

[3] J. Schlemper, J. Caballero, J. V. Hajnal, A. Price, and D. Rueckert, “A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction,” in Information Processing in Medical Imaging, 647–658 (2017).

[4] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen, “Image reconstruction by domain-transform manifold learning,” Nature, 555: 487–492 (2018).

[5] M. Mardani, E. Gong, J. Y. Cheng, S. S. Vasanawala,G. Zaharchuk, L. Xing, and J. M. Pauly., "Deep Generative Adversarial Neural Networks for Compressive Sensing (GANCS) MRI," in IEEE Transactions on Medical Imaging 2018, in press: doi: 10.1109/TMI.2018.285875.

[6] F. Chen, V. Taviani, I. Malkiel, J. Y. Cheng, J. I. Tamir, J. Shaikh, S. T. Chang, C. J. Hardy, J. M. Pauly, and S. S. Vasanawala, “Variable-Density Single-Shot Fast Spin-Echo MRI with Deep Learning Reconstruction by Using Variational Networks,” Radiology, 289: 366–373 (2018).

[7] S. J. Inati, J. D. Naegele, N. R. Zwart, V. Roopchansingh, M. J. Lizak, D. C. Hansen, C. Y. Liu, D. Atkinson, P. Kellman, S. Kozerke, H. Xue, A. E. Campbell-Washburn, T. S. Sørensen, and M. S. Hansen, “ISMRM Raw data format: A proposed standard for MRI raw datasets,” Magnetic Resonance in Medicine, 77: 411–421 (2016).

[8] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li and L. Fei-Fei, "ImageNet: A Large-Scale Hierarchical Image Database". IEEE Computer Vision and Pattern Recognition (CVPR) 2009.


Figures

Figure 1: Overview of IFFT sum-of-squares reference reconstructions of the sequences used to acquire raw data represented in the dataset: Knee: Coronal proton-density weighting with and without fat suppression. Brain: Axial T2 weighting and T1 weighting. Liver: Axial T2 weighting with fat suppression.

Table 1: Acquisition parameters for the two knee imaging protocols used to acquire raw data represented in the dataset. Since not all parameters are completely identical for the different MR scanners that were used during data acquisition, a range of sequence parameters is shown in some cases.

Table 2: Acquisition parameters for the two brain imaging protocols used to acquire raw data represented in the dataset. Since not all parameters are completely identical for the different MR scanners that were used during data acquisition, a range of sequence parameters is shown in some cases.

Table 3: Acquisition parameters for the liver imaging protocol used to acquire raw data represented in the dataset. Since not all parameters are completely identical for the different MR scanners that were used during data acquisition, a range of sequence parameters is shown in some cases.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)
0657