3514

Machine Learning-based Analysis of Heterogeneous, Multi-center MR Datasets: Impact of Scan Variability

Mariana Bento^1,2, Justin Park^2,3, and Richard Frayne¹
¹Radiology and Clinical Neuroscience, Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada, ²Calgary Image Processing and Analysis Centre, Foothills Medical Centre, Calgary, AB, Canada, ³Mechanical Engineering, University of Calgary, Calgary, AB, Canada

Synopsis

Multi-centre heterogeneous imaging datasets are frequently required to develop image-based computer-aided diagnosis and treatment monitoring tools. However, these datasets may present large underlying variability, potentially impacting the performance of the developed tools. Here, as a proof-of-concept, we propose a machine learning method to study scan variability related to the scanner vendor and magnetic field strength in brain MR images from two cohorts of healthy subjects. Our model has high accuracy rates (>92%), confirming the presence of scan variability in heterogeneous, multi-centre datasets. This model may be further incorporated into automated diagnostic tools, potentially allowing more reliable and robust results.

Introduction

Magnetic resonance (MR) imaging is a commonly used diagnostic imaging modality.¹ The advent of multi-center studies has resulted in additional challenges related to data volume and variability, particularly when acquiring images across research facilities.² Reliable computer-aided diagnosis (CAD) tools that successfully generalize to multi-center data^3,4,5 are more likely to be adopted in broader clinical and research environments. The problem with using heterogeneous datasets is that these images have inherent but unwanted scan variability characteristics that may influence decision making, resulting in undesirable outcomes. Our goal is to develop a machine learning classifier to study scan variability. As a demonstration concept, we chose to classify vendor and field strength from a cohort of heterogeneous brain MR images.^5,6 Our study analyzes the inherent image variation and potential differences between these classes; it does not identify the vendor and field strength from metadata tags. Extended versions of the developed model can be included in future CAD tools to enhance the robustness of results across multiple datasets. Our model may also facilitate several other applications, including longitudinal studies in which the scanner or protocol may change over time; quality control methods to verify acquisition protocol compliance; image data curation by completing missing or correcting metadata; and enabling image retrieval tasks.

Data and Methods

We developed a classification model to distinguish data based on vendor (General Electric, Philips, and Siemens) and magnetic field strength (1.5 T and 3 T). Our model was a six-class convolution neural network. It was developed using a combination of two publicly available MR brain datasets: Calgary-Campinas-359 (CC-359 – sites.google.com/view/calgary-campinas-dataset/home)⁷ and Alzheimer’s Disease Neuroimaging Initiative (ADNI - adni.loni.usc.edu).⁸ Each class (a permutation of vendor and field strength) had approximately has 100 volumetric T1-weighted images in a total of 599 healthy controls (Fig 1). These images were randomly split into training/validation (80% of total samples) and testing (20% of total samples), preserving the class balance. Pre-processing consisted of: 1) extracting the three consecutive two-dimensional sagittal slices at the center of the image volume, 2) resizing the images to 256x256, and 3) applying min-max intensity normalization (intensity values scaled to 0 to 1). Our model included three hidden layers, used categorical cross-entropy as loss function and employed the Adam optimizer.^9,10 The model used softmax as the activation function to output the probability of each vendor-field strength permutation (Fig 2).

Results

The performance was first evaluated in the training and validation phase (Fig 3). We achieved an accuracy rate of 92% over the previously unseen testing set. There were misclassified samples related to the vendor only (4%), the magnetic field only (2%) or both vendor and magnetic field strength (2%). A qualitative evaluation by inspecting misclassified samples was also undertaken (Fig 4).

Discussion

The developed model presented a high accuracy rate by only accessing the intrinsic imaging characteristics related to the vendor and field strength used in the acquisition. We detected a low rate of misclassified samples. This result suggests that our model presents consistent results for all studied sources of variability. The inclusion of other sources of variability (other acquisition parameters such as repetition time, echo time, among others) on our model could be performed by using a transfer learning approach. We anticipate that understanding inherent variability can improve the design and performance CAD tools. This exploration will allow tools that consider and correct for scan variability to achieve reliable and robust results¹¹ (i.e. provide an answer to the proposed research question that is not influenced by other variability sources). One possibility is to combine the proposed method with a CAD tool using adversarial approaches, optimize the final model to have good results in the proposed classification task, and minimize the imaging variability due to acquisition parameters. Our model can also serve as a quality control method to ensure that the images meet the required standards.¹² However, such extensions would require further training data, validation, and analysis.

Conclusion

We studied the data variability in this work due to differences in scan variability such as image acquisition parameters. Specifically, here we have examined vendor and field strength. The proposed demonstration model may be used to distinguish MR images based on the scanners' characteristics for acquisition with high accuracy. There are no other studies that propose a similar classification task using heterogeneous, multi-center datasets to the best of our knowledge. Scan variability must be considered in CAD tools development, allowing more reliable and robust results related to the proposed research question, minimizing the effects of image variability. This leads to more reliable ML models that may improve their usage in clinical practice.

Acknowledgements

The authors would like to thank the Hotchkiss Brain Institute (HBI) and the Canadian Open Neuroscience Platform (CONP) for financial support. We also thank the University of Calgary for making available computational resources to perform the presented experiments.

References

1. Heinen R, Bouvy WH, Mendrik AM, et al. Robustness of automated methods for brain volume measurements across different MRI field strengths. PloS One. 2016; 11(10):e0165719.

2. Helmer K G, Chou M C, Preciado R I, et al. Multi-site study of diffusion metric variability: effects of site, vendor, field strength, and echo time on regions-of-interest and histogram-bin analyses. Proceedings of SPIE 2016:9788. doi:10.1117/12.2217445.

3. Schlett C L, Hendel T, Hirsch J, et al. Quantitative, organ-specific interscanner and intrascanner variability for 3 T whole-body magnetic resonance imaging in a multicenter, multivendor study. Investigative Radiology. 2016; 51(4): 255–265.

4. Collewet G, Strzelecki M, & Mariette F. Influence of MRI acquisition protocols and image intensity normalization methods on texture classification. Magnetic Resonance Imaging. 2004;22(1):81–91.

5. Schlett C, Hendel T, Hirsch J, et al. Quantitative, Organ-Specific Interscanner and Intrascanner Variability for 3 T Whole-Body Magnetic Resonance Imaging in a Multicenter, Multivendor Study. Investigative Radiology. 2016. 51(4): 255–265.

6. Jovicich J, Czanner S, Han X, et al. MRI-derived measurements of human subcortical, ventricular and intracranial brain volumes: Reliability effects of scan sessions, acquisition sequences, data analyses, scanner upgrade, scanner vendors and field strengths. Neuroimage. 2009. 46(1): 177–192.

7. Souza R, Lucena O, Garrafa J, et al. An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement. NeuroImage. 2018. 170: 482–494.

8. Mueller S G, Weiner M W, Thal L J, et al. The Alzheimer's Disease Neuroimaging Initiative. Neuroimaging Clinics of North America. 2005. 15(4):869-xii.

9. McBee M P, Awan O A, Colucci A T, et al. Deep Learning in radiology. Academic Radiology. 2018; 25(11): 1472–1480.

10. Shin HC, Roth HR, Gao M, et al. Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging. 2016. 35(5): 1285–1298.

11. Thrall J H, Li X, Li Q, et al. Artificial intelligence and machine learning in radiology: Opportunities, challenges, pitfalls, and criteria for success. Journal of the American College of Radiology: JACR. 2018. 15(3): 504–508.

12. Willemink MJ, Koszek WA, Hardell C, et al. Preparing medical imaging data for machine learning. Radiology. 2020. 295(1): 4–15.

Figures

Fig 1. Representative images extracted from the T1-weighted volumes across the two experimental datasets: (a) CC-359 data and (b) ADNI data. Each column represents the vendor (GE, Phillips and Siemens), and each row represents the magnetic field strength (1.5 T and 3 T).

Fig 2. Proposed convolution neural network model. The model has one input layer, three hidden layers, followed by a fully connected and output layers. Input layer takes in an 256x256 image. The hidden layers consist of 2D convolutions (3x3 kernels), max pooling and a dropout layers. The fully connected layer connects with the output layer with six nodes that hold a probability of each of the six vendor and magnetic field permutations.

Fig 3. Loss and accuracy rates achieved during model training. Similar losses and accuracies were achieved for the training and validation sets.

Fig 4. Examples of some misclassified images of our model. There were misclassifications on both vendor and magnetic field strength, such as (a) a Siemens 3 T images classified as GE 1.5 T image; only vendor, (b) a Philips 3 T sample classified as Siemens 3 T; and only field strength, (c) a Siemens 3 T classified as Siemens 1.5 T.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

3514