Hanna Bugler1,2,3,4, Roberto Souza3,5, and Ashley D. Harris2,3,4
1Department of Biomedical Engineering, University of Calgary, Calgary, AB, Canada, 2Department of Radiology, University of Calgary, Calgary, AB, Canada, 3Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada, 4Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada, 5Department of Electrical & Software Engineering, University of Calgary, Calgary, AB, Canada
Synopsis
Keywords: Spectroscopy, Spectroscopy, Machine Learning/Artificial Intelligence, Artifacts, Data Processing, Software Tools, Simulations, Brain, Pediatric
Motivation: GABA-edited MRS suffers from data quality challenges due to its low signal to noise ratio (SNR).
Goal(s): We propose an automated labeling algorithm for transient quality and a dual-domain deep learning model for filtering spectra transients based on quality.
Approach: We trained our model with simulated data containing commonly occurring artifacts labelled with our continuous automated labelling algorithm which ranges from –1 (poor quality) to +1 (good quality). We subsequently evaluated our model’s performance by removing (filtering) poor quality transients corresponding to quality values less than 0.
Results: Our model outperformed qualitatively simple averaging using all collected transients for 70-80% of scans.
Impact: Our model can successfully assign a continuous quality score between –1 (poor) and +1 (good) to GABA-edited MRS difference data (i.e., a single ON-OFF edit pair) which when used for filtering, improves MRS quality metrics compared to simple transient averaging.
INTRODUCTION
GABA-edited MRS is used to isolate and quantify GABA, as the GABA peaks are overlapped by more abundant signals in a typical spectrum. As a result, GABA-edited MRS data has a low signal and is highly affected by noise1. While some machine learning based solutions exist for quality filtering of MRS data2-8, they do not exist for GABA-edited MRS. In addition, current approaches are limited by user quality ratings (which are inherently subjective)2-7, require large amounts of manually labelled in vivo spectra2-7 and model the problem as a discrete class or binary classification (i.e., good or poor)2-8. We propose a deep learning model to filter individual averages for improved GABA-edited spectral quality. Our dual domain or DD-model was trained using our automated data labelling approach (or Distance Away from Mean, DAM, algorithm) that considers linewidth, SNR, peak shapes, outliers and artifact presence, and it was compared to simple transient averaging (SimpleAvg). MATERIALS AND METHODS
Nine hundred GABA-edited difference (edit-ON minus edit-OFF) ground truth spectra were simulated9 and split into train (600), validation (200) and test (100) sets. Each ground truth spectrum had random frequency and phase shifts and gaussian amplitude noise added to simulate a typical 320 transient (160 edit-ON and 160 edit-OFF) scan with SNR ~25. Of these transients, 40-100 were then further contaminated by a random mix of four artifacts: ghosts/spurious echoes, eddy currents, lipid contamination and motion contamination, to simulate poor quality spectra using in-house python scripts (Figure 1).
Automated labels for each transient ON-OFF pair were obtained using our proposed DAM algorithm. Briefly, GABA linewidth, SNR, GABA and Glx shape score (based on the correlation between the shape of the current spectrum and the ground truth10), percentage of spectral points defined as outliers compared to the mean and artifact presence were used to generate continuous labels for each edit-ON-edit-OFF transient pair that forms the difference spectrum. For a single difference spectrum, if these metrics were cumulatively better than those of the simple average of the difference spectrum, the DAM score attributed was positive, if they were cumulatively worse, the DAM score attributed was negative. DAM scores ranged from –1 (poor quality) to +1 (good quality); those with a score greater than 0 were included in the calculated difference spectrum.
Balanced training (14,640 transient-label pairs) and validation (1,200 transient-label pairs) datasets were created with values covering the entire DAM score range. The test set was composed of whole scans (50 scans composed of 160 difference transients).
The dual-domain deep learning model (DD-model) (Figure 3) was designed to learn quality scores associated with edit-ON-edit-OFF transient pairs as defined by DAM. These scores would then be used to reject transients which do not meet the predefined quality threshold to improve spectral quality. DD-model performance was compared to the SimpleAvg spectrum using the mean absolute error and through the assessment of quality metrics such as SNR, linewidth, shape score and outlier percentage. RESULTS
The DD-model predicted quality scores with a mean absolute error of 0.180 compared to DAM labelling quality scores. As the DD-model can use different quality thresholds for transient inclusion or rejection, results for a threshold of 0 are presented here. In comparison to SimpleAvg spectra, when filtering using the DD-model, SNR improved for 75% of scans, linewidth improved for 80% scans, and shape score improved for 70% scans (Figure 4 and Table 1). Compared to the spectra filtered using DAM, the DD-model filtering improved SNR for 34% of scans, linewidth for 10% of scans, and shape score for 6% of scans.DISCUSSION
Our model successfully predicted transient quality while improving or maintaining scan quality metric values.
We note smaller or less frequent improvements to SNR as compared with other metrics. This was anticipated as averaging transients with large artifacts can misleadingly increase SNR in addition to overestimating quantification. In addition, improving one quality metric does not necessarily result in improvements of another quality metric. Since the objective of the DAM labelling technique (and consequently our model) is to improve the collective of quality metrics, single metric decreases can be anticipated to improve all quality metrics by relative margins. CONCLUSION
Our proposed DD-model successfully predicted quality scores which improved all quality metrics compared to simply including and averaging all 320 transients. In addition, our proposed automated labelling method (DAM) showed improvements in all quality metrics over the 320 transient averaged scan making it a good alternative to manual quality labeled data. Future work should investigate the model’s applicability and real-time performance on in vivo data. Acknowledgements
HB was supported by NSERC Brain CREATE Award and Alberta Graduate Excellence Scholarship. RS was supported by NSERC Discovery Grant (#RGPIN-2021-02867) and AH was supported by NSERC Discovery Grant (# RGPIN-2017-03875). References
- Mullins PG, McGonigle DJ, O’Gorman RL, et al. Current practice in the use of MEGA-PRESS spectroscopy for the detection of GABA. NeuroImage. 2014; 86:43-52.
- Rakic M, Turco F, Weng G, Maes F, Sima DM, and Slotboom J. Deep learning pipeline for quality filtering of MRSI spectra. NMR Biomedicine. 2023; e5012.
- Wright AJ, Arus C, Wijnen JP, Moreno-Torres A, Griffiths JR, Celda B, and Howe FA. Automated quality control protocol for MR spectra of brain tumors. Magn Reson Med. 2008; 59(6): 1274-1281.
- Menze BH, Kelm M, Weber M, Bachert P, and Hamprecht FA. Mimicking the human expert: Pattern recognition for an automated assessment of data quality in MR spectroscopic images. Magn Reson Med. 2008; 59(6): 1457-1466.
- Kyathanahally SP, Mocioiu V, Pedrosa de Barros N, Slotboom J, Wright AJ, Julia-Sape M, Arus C, and Kreis R. Quality of clinical brain tumor MR spectra judged by humans and machine learning tools. Magn Reson Med. 2018; 79(5): 2500-2510.
- Pedrosa de Barros N, McKinley R, Knecht U, Wiest R, and Slotboom J. Automatic quality control in clinical 1H MRSI of brain cancer. NMR Biomedicine. 2016; 29 (5): 563-575.
- Gurbani SS, Schreibmann E, Maudsley AA, Cordova JS, Soher BJ, Poptani H, Verma G, Barker PB, Shim H, and Cooper LAD. A convolutional neural network to filter artifacts in spectroscopic MRI. Magn Reson Med. 2018; 80 (5): 1765-1775.
- Jang J, Lee HH, Park J, and Kim H. Unsupervised anomaly detection using generative adversarial networks in 1H-MRS of the brain. JMR. 2021; 325: 106936.
- Simpson R, Devenyi GA, Jezzard P, Hennessy TJ, and Near J. Advanced processing and simulation of MRS data using the FID applicance (FID-A) - An open source, MATALAB-based toolkit. Magn Reson Med. 2017;77(1):23-33.
- Berto R, Bugler H, Dias G, et al. Advancing GABA-edited MRS through a Reconstruction Challenge. BioRxiv preprint. 2023.