2015

Impact of training size on deep learning performance in in vivo ¹H MRS

Sungtak Hong¹ and Jun Shen¹
¹National Institute of Mental Health, National Institutes of Health, Bethesda, MD, United States

Synopsis

Deep learning has found an increasing number of applications in MRS. Nevertheless, few studies have addressed the impact of training data size on deep learning performance. In this work, we used density matrix simulation to generate a very large training dataset (70,000 spectra). Then comprehensive comparison was performed to evaluate deep learning performance with different training data sizes.

Introduction

Recent studies have demonstrated that deep learning techniques are applicable to MRS in addressing several issues such as spectral restoration from deteriorated signals¹, ghosting artifact removal², frequency and phase correction³, and metabolite quantification⁴. However, training data size is often selected empirically or arbitrarily based on data availability and computing resources. The required training data size for adequate training of the model has not been sufficiently investigated. In this study, we evaluated the impact of training data size on the performance of deep learning in solving the spectral restoration problem as in vivo ¹H MRS data frequently suffer from low SNR and broad linewidth.

Materials and Methods

Data generation Full 3D density matrix simulations were used to generate 20 metabolite spectra including alanine, ascorbate, aspartate, creatine, gamma aminobutyric acid, glucose, glutamate, glutamine, glutathione, glycerylphos-phorylcholine, glycine, lactate, myo-inositol, N-acetylaspartate, N-acetylaspartylglutamate, phosphocholine, phosphocreatine, phosphorylethanolamine, scyllo-inositol, and taurine for a PRESS sequence at 3 T. Simulation parameters were: TE = 35 ms (TE1 = 14 ms, TE2 = 21 ms); spectral bandwidth = 4000 Hz; Shinnar-Le Roux (SLR) optimized excitation pulse (duration = 6 ms, bandwidth = 2000 Hz); and SLR optimized refocusing pulse (duration = 5 ms, bandwidth = 2000 Hz). Macromolecule signals were modeled using 13 Gaussian components.⁵ In total, 75,000 spectra were generated by randomly changing both relative metabolite concentration of individual metabolite and amplitude of individual macromolecule signals, which were considered to be the ground truth dataset. Subsequently individual spectrum was progressively degraded by broadening linewidth and adding white noise and then assigned to test dataset. SNR was measured as the ratio of signal intensity of total NAA to two times of the standard deviation of spectral noise ranging from 8 ppm to 10 ppm. Empirically, an SNR of 10 was selected as the threshold for distinguishing high SNR from low SNR.
Convolutional neural network Figure 1 shows the convolutional neural network (CNN) architecture. Each convolutional block consisted of four pairs of 1D convolutional layer and batch normalization layer. The exponential linear unit (ELU) activation function was used for the whole network. Bayesian optimization⁶ was used to find optimal network parameters including number of layers per convolutional block, kernel size, and learning rate. This process took approximately 25 hours. After assigning 70,000 spectra to training data and 5000 spectra to test data, the CNN was trained with the following number of different training subsets: 300, 500, 1000, 2500, 5000, 30,000, and 70,000 spectra. During the training phase, 20% of each training dataset was split for validation dataset. Training was performed in the complex domain so that CNN-predicted spectra can be used by spectral fitting techniques such as LCModel and jMRUI for quantifying metabolite concentrations. Thus, the input and output of the CNN had two channels for real and imaginary parts of MRS data, respectively. The CNN was trained using an Adam algorithm with a fixed learning rate of 10^-4, a batch size of 32, and 30 epochs. Mean squared error (MSE) was used as the loss function. Early stopping was applied to terminate training when the model performance ceased to improve for three epochs based on the validation dataset. The CNN was implemented and trained using the Keras library with a TensorFlow backend on a supercomputer cluster (32 GB NVIDIA Tesla V100).
CNN evaluation The CNN was trained ten times on each of the seven training sizes and then the network close to the mean MSE was assigned to be the representative one for a corresponding training size. The normalized mean squared error (NMSE) between ground truth spectra and CNN-predicted spectra was used as a metric to assess the performance of CNN. The NMSE for high SNR and low SNR was calculated from the aforementioned representative networks.

Results

Figure 2 depicts representative simulated PRESS ¹H MRS spectra at different SNRs together with corresponding CNN-predicted spectra for different training data sizes. Visually, CNN-predicted spectra improved the most when the training sizes changed from 300 cases to 500 cases. As summarized in Table 1, the mean NMSE decreased notably from training size of 300 cases to 500 cases and showed minimal improvement at training size above 2500 cases. Likewise, the highest NMSE difference between high SNR and low SNR was observed at 300 cases (0.0379) and this difference became negligible after 2500 cases (≤ 0.0005).

Discussion

The present study demonstrated that the benefit of larger training data sizes could be marginal after reaching a threshold number of datasets in training CNN to restore degraded in vivo ¹H MRS spectra. This threshold number is expected to be dependent on the complexity of dataset. Accordingly, a reduced threshold number is predicted for less crowed data, for example, long-TE spectra, compared to the more complex dataset such as short-TE spectra used in this work. A future study is warranted to evaluate the impact of different training data sizes on accuracy and precision of metabolites quantification using CNN-predicted spectra.

Acknowledgements

No acknowledgement found.

References

1. Lee HH, Kim H. Intact metabolite spectrum mining by deep learning in proton magnetic resonance spectroscopy of the brain. Magn Reson Med. 2019;82:33–48.

2. Kyathanahally SP, Doring A, Kreis R. Deep learning approaches for detection and removal of ghosting artifacts in MR spectroscopy. Magn Reson Med. 2018;80:851–863.

3. Tapper S, Mikkelsen M, Dewey BE, et al. Frequency and phase correction of J-difference edited MR spectra using deep learning. Magn Reson Med. 2020;00:1–11.

4. Das D, Coello E, Schulte RF, Menze BH. Quantification of metabolites in magnetic resonance spectroscopic imaging using machine learning. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, Quebec, Canada; 2017:462–470.

5. Birch R, Peet AC, Dehghani H, Wilson M. Influence of macromolecule baseline on 1H MR spectroscopic imaging reproducibility. Magn Reson Med. 2017;77:34–43.

6. Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. In: Proceedings of International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, 2012. pp 2951–2959.

Figures

Figure 1. Schematic overview illustrating the generation of dataset and the proposed network architecture featuring consecutive three convolutional blocks. A pair of 1D convolutional layer and batch normalization layer act as a fundamental component with four times repetitions for completing each block. Network training was conducted with pairs of ground truth spectra and progressively degraded spectra while minimizing the mean squared error using Adam optimization algorithm. Learning rate was set to 10^-4.

Figure 2. Numerically calculated ¹H MRS spectra at low SNR (left column) and high SNR (right column). CNN-predicted spectra, difference spectra (ground truth – predicted), and NMSE illustrate the impact from using different training sizes in CNN.

Table 1. Mean NMSE using different sizes of training subsets. The mean NMSE reached plateau at 2500 cases for both low SNR and high SNR and showed marginal improvement after this point.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

2015