1837

Quality Assessment Tool using Deep Learning for GABA-Edited MRS data

Hanna Bugler^1,2,3,4, Roberto Souza^3,5, and Ashley D. Harris^2,3,4
¹Department of Biomedical Engineering, University of Calgary, Calgary, AB, Canada, ²Department of Radiology, University of Calgary, Calgary, AB, Canada, ³Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada, ⁴Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada, ⁵Department of Electrical & Software Engineering, University of Calgary, Calgary, AB, Canada

Synopsis

Keywords: Spectroscopy, Spectroscopy, Machine Learning/Artificial Intelligence, Artifacts, Data Processing, Software Tools, Simulations, Brain, Pediatric

Motivation: GABA-edited MRS suffers from data quality challenges due to its low signal to noise ratio (SNR).

Goal(s): We propose an automated labeling algorithm for transient quality and a dual-domain deep learning model for filtering spectra transients based on quality.

Approach: We trained our model with simulated data containing commonly occurring artifacts labelled with our continuous automated labelling algorithm which ranges from –1 (poor quality) to +1 (good quality). We subsequently evaluated our model’s performance by removing (filtering) poor quality transients corresponding to quality values less than 0.

Results: Our model outperformed qualitatively simple averaging using all collected transients for 70-80% of scans.

Impact: Our model can successfully assign a continuous quality score between –1 (poor) and +1 (good) to GABA-edited MRS difference data (i.e., a single ON-OFF edit pair) which when used for filtering, improves MRS quality metrics compared to simple transient averaging.

INTRODUCTION

GABA-edited MRS is used to isolate and quantify GABA, as the GABA peaks are overlapped by more abundant signals in a typical spectrum. As a result, GABA-edited MRS data has a low signal and is highly affected by noise¹. While some machine learning based solutions exist for quality filtering of MRS data^2-8, they do not exist for GABA-edited MRS. In addition, current approaches are limited by user quality ratings (which are inherently subjective)^2-7, require large amounts of manually labelled in vivo spectra^2-7 and model the problem as a discrete class or binary classification (i.e., good or poor)^2-8. We propose a deep learning model to filter individual averages for improved GABA-edited spectral quality. Our dual domain or DD-model was trained using our automated data labelling approach (or Distance Away from Mean, DAM, algorithm) that considers linewidth, SNR, peak shapes, outliers and artifact presence, and it was compared to simple transient averaging (SimpleAvg).

MATERIALS AND METHODS

Nine hundred GABA-edited difference (edit-ON minus edit-OFF) ground truth spectra were simulated⁹ and split into train (600), validation (200) and test (100) sets. Each ground truth spectrum had random frequency and phase shifts and gaussian amplitude noise added to simulate a typical 320 transient (160 edit-ON and 160 edit-OFF) scan with SNR ~25. Of these transients, 40-100 were then further contaminated by a random mix of four artifacts: ghosts/spurious echoes, eddy currents, lipid contamination and motion contamination, to simulate poor quality spectra using in-house python scripts (Figure 1).

Automated labels for each transient ON-OFF pair were obtained using our proposed DAM algorithm. Briefly, GABA linewidth, SNR, GABA and Glx shape score (based on the correlation between the shape of the current spectrum and the ground truth¹⁰), percentage of spectral points defined as outliers compared to the mean and artifact presence were used to generate continuous labels for each edit-ON-edit-OFF transient pair that forms the difference spectrum. For a single difference spectrum, if these metrics were cumulatively better than those of the simple average of the difference spectrum, the DAM score attributed was positive, if they were cumulatively worse, the DAM score attributed was negative. DAM scores ranged from –1 (poor quality) to +1 (good quality); those with a score greater than 0 were included in the calculated difference spectrum.

Balanced training (14,640 transient-label pairs) and validation (1,200 transient-label pairs) datasets were created with values covering the entire DAM score range. The test set was composed of whole scans (50 scans composed of 160 difference transients).

The dual-domain deep learning model (DD-model) (Figure 3) was designed to learn quality scores associated with edit-ON-edit-OFF transient pairs as defined by DAM. These scores would then be used to reject transients which do not meet the predefined quality threshold to improve spectral quality. DD-model performance was compared to the SimpleAvg spectrum using the mean absolute error and through the assessment of quality metrics such as SNR, linewidth, shape score and outlier percentage.

RESULTS

The DD-model predicted quality scores with a mean absolute error of 0.180 compared to DAM labelling quality scores. As the DD-model can use different quality thresholds for transient inclusion or rejection, results for a threshold of 0 are presented here. In comparison to SimpleAvg spectra, when filtering using the DD-model, SNR improved for 75% of scans, linewidth improved for 80% scans, and shape score improved for 70% scans (Figure 4 and Table 1). Compared to the spectra filtered using DAM, the DD-model filtering improved SNR for 34% of scans, linewidth for 10% of scans, and shape score for 6% of scans.

DISCUSSION

Our model successfully predicted transient quality while improving or maintaining scan quality metric values.

We note smaller or less frequent improvements to SNR as compared with other metrics. This was anticipated as averaging transients with large artifacts can misleadingly increase SNR in addition to overestimating quantification. In addition, improving one quality metric does not necessarily result in improvements of another quality metric. Since the objective of the DAM labelling technique (and consequently our model) is to improve the collective of quality metrics, single metric decreases can be anticipated to improve all quality metrics by relative margins.

CONCLUSION

Our proposed DD-model successfully predicted quality scores which improved all quality metrics compared to simply including and averaging all 320 transients. In addition, our proposed automated labelling method (DAM) showed improvements in all quality metrics over the 320 transient averaged scan making it a good alternative to manual quality labeled data. Future work should investigate the model’s applicability and real-time performance on in vivo data.

Acknowledgements

HB was supported by NSERC Brain CREATE Award and Alberta Graduate Excellence Scholarship. RS was supported by NSERC Discovery Grant (#RGPIN-2021-02867) and AH was supported by NSERC Discovery Grant (# RGPIN-2017-03875).

References

Mullins PG, McGonigle DJ, O’Gorman RL, et al. Current practice in the use of MEGA-PRESS spectroscopy for the detection of GABA. NeuroImage. 2014; 86:43-52.
Rakic M, Turco F, Weng G, Maes F, Sima DM, and Slotboom J. Deep learning pipeline for quality filtering of MRSI spectra. NMR Biomedicine. 2023; e5012.
Wright AJ, Arus C, Wijnen JP, Moreno-Torres A, Griffiths JR, Celda B, and Howe FA. Automated quality control protocol for MR spectra of brain tumors. Magn Reson Med. 2008; 59(6): 1274-1281.
Menze BH, Kelm M, Weber M, Bachert P, and Hamprecht FA. Mimicking the human expert: Pattern recognition for an automated assessment of data quality in MR spectroscopic images. Magn Reson Med. 2008; 59(6): 1457-1466.
Kyathanahally SP, Mocioiu V, Pedrosa de Barros N, Slotboom J, Wright AJ, Julia-Sape M, Arus C, and Kreis R. Quality of clinical brain tumor MR spectra judged by humans and machine learning tools. Magn Reson Med. 2018; 79(5): 2500-2510.
Pedrosa de Barros N, McKinley R, Knecht U, Wiest R, and Slotboom J. Automatic quality control in clinical 1H MRSI of brain cancer. NMR Biomedicine. 2016; 29 (5): 563-575.
Gurbani SS, Schreibmann E, Maudsley AA, Cordova JS, Soher BJ, Poptani H, Verma G, Barker PB, Shim H, and Cooper LAD. A convolutional neural network to filter artifacts in spectroscopic MRI. Magn Reson Med. 2018; 80 (5): 1765-1775.
Jang J, Lee HH, Park J, and Kim H. Unsupervised anomaly detection using generative adversarial networks in 1H-MRS of the brain. JMR. 2021; 325: 106936.
Simpson R, Devenyi GA, Jezzard P, Hennessy TJ, and Near J. Advanced processing and simulation of MRS data using the FID applicance (FID-A) - An open source, MATALAB-based toolkit. Magn Reson Med. 2017;77(1):23-33.
Berto R, Bugler H, Dias G, et al. Advancing GABA-edited MRS through a Reconstruction Challenge. BioRxiv preprint. 2023.

Figures

Figure 1. Simulated GABA-edited difference spectra. A ground truth spectrum was simulated (black). This ground truth spectrum was then replicated into 320 transients (160 ON-OFF pairs) and random noise, frequency and phase shifts, and artifacts (including line broadening, ghosting, Eddy Currents, and lipid contamination) were added to better reflect in vivo data. The mean difference spectrum from these individual transients was then calculated (red). Example difference spectra are shown as coloured data.

Figure 2. Distance away from mean (DAM) continuous labelling method. The top panel shows example spectra from different ranges of the labelling method from Best (green) to Worst (red). The second panel shows the averaged spectrum (green) of high-quality spectra with labelling scores over zero. The third panel shows the mean spectrum (red) of low-quality spectra with labelling scores below zero. The bottom panel shows the mean spectrum (blue) of all spectra within the simulated scan.

Figure 3. The DD-model is composed of a frequency domain network which takes in five (width) windowed spectra between 1.09 and 5.00 ppm (height). The second network operates in the time domain which takes five (width) windowed FIDs for the first 400 points (height). The networks output a 64-channel feature map which is flattened and concatenated. The features then pass through a set of fully connected layers with the final output providing the quality score of the third or middle spectrum/FID of the input.

Figure 4. One scan chosen at random from the test set is shown above with the top panel showing mean reconstruction and metric results for all 320 (or 160 difference) transients. The middle panel shows the mean reconstruction and metric results for the transients identified by DAM labelling with quality scores above 0. The bottom panel shows the mean reconstruction and metric results for the transients identified by our DD-model labelling with quality scores above 0.

Table 1. MRS Quality Metric Values for Different Transient Inclusion Methods for a Threshold of 0.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

1837

DOI: https://doi.org/10.58530/2024/1837