1076

Rotating-view super-resolution (ROVER)-MRI reconstruction using tailored Implicit Neural Network

Jun Lyu¹, Lipeng Ning¹, William Consagra¹, Qiang Liu¹, and Yogesh Rathi¹
¹Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States

Synopsis

Keywords: Machine Learning/Artificial Intelligence, Machine Learning/Artificial Intelligence

Motivation: Direct acquisition of high resolution data is time-consuming and degrades SNR. Super-resolution reconstruction (SRR) is widely used to address these challenges. However, existing reconstruction tools use algorithms that are sensitive to noise and motion.

Goal(s): Our study aims to develop a training-free deep learning-based SRR method that integrates multi-view thick-slice data to reconstruct images with enhanced spatial resolution and high SNR.

Approach: We used an implicit neural representation (INR) network, leveraging data from scans at various views, to achieve high isotropic SRR.

Results: Our technique exhibited 30% better SNR and significant motion-robustness compared to existing techniques.

Impact: Implicit neural representations allow continuous functional representation of MRI images thereby being a natural candidate for performing SRR in low SNR regimes. Our study validates the feasibility of employing INRs to reduce scan time, motion artifacts, and achieve high-quality SRR.

Introduction

High spatial resolution in MRI aids in precise anatomical localization, enhancing the quality of interpretation and analysis. However, acquiring direct high-resolution (HR) images is plagued with low SNR and long scan times. Other challenges include patient motion and other physiological noise during prolonged scans. Current acquisition methods include RF-encoding of thick slabs using gSlider[1] or acquiring rotating/translating views with thick slabs[2,3,4]. The reconstruction of the data in these cases is typically done using standard L2 minimization with a Tikonov regularization term. However, such an approach is highly sensitive to noise and cannot be easily extended to sparse data regimes. While standard supervised deep learning techniques provide an alternative, yet they require ground truth high-resolution data for training, which is typically not available. Further, such techniques are highly sensitive to the training dataset used, and may fail in cases of gross anatomical abnormalities.

We address these challenges in this work, and propose to use a training-free unsupervised implicit neural network[5] that provides a continuous functional representation of the image and does not require any ground truth data for learning the representation. Given the continuous representation, INRs can naturally be used to represent data from different views, thereby allowing SRR with fewer views than that required by Nyquist criteria (π/2 x super-resolution factor).

Methods

Data

As shown in Figure 1(a), we acquired low-resolution (LR) diffusion MRI (dMRI) images at eight different rotation angles, separated by 22.5^° each with a spatial resolution of 1x1x5 mm³ keeping the phase encoding the same across all rotations to ensure similar geometric distortions. A single b0 and several diffusion-weighted images at b=1000 s/mm² were also acquired.

Implicit Neural Representation

Figure 1(b) illustrates our SRR technique using implicit neural representations. We use a common coordinate system (termed RAS) to represent the data from all views. The network takes RAS spatial coordinates for any continuous point within the image field as input and generates the corresponding pixel values as output.

Figure 2 presents a comprehensive framework overview. (a) Coordinate Preprocessing: The input comprises matrix coordinates of LR images from each view. Following multiplication by an affine matrix, the output represents specific values in the RAS coordinate system. (b) Fitting procedure: Given a coordinate [r, a, s], the feature embedding layer encodes it into feature maps. Subsequently, the INR decodes it into pixel intensity. The subpixel mapping block averages the output of INR in the z-direction, facilitating the computation of the mean square error (MSE) loss with the corresponding LR image.

Implementation

We normalize the RAS coordinates to the interval [0, 1] and the intensity of dMRI images to [0, 1]. In our INR, we employ 512-dimensional Fourier features. Our model comprises a four-layer MLP with a hidden dimension of 256. We utilize the Adam optimizer with a learning rate of 1e-4. The number of iterations is set to 3500 in total. We train our model using PyTorch on a single A6000 GPU. We adopted two SOTA SSR methods as baselines: 1) Bicubic and 2) LS-SSR[2].

Results

Experiment 1: SRR Results

Figure 3 shows qualitative results for bicubic, LS-SRR, and our approach at b=1000 s/mm²for three directions. The results for bicubic interpolation appear notably blurry. The LS-SRR method exhibits pronounced noise and motion artifacts, while our approach excels in terms of image details and sharpness. Also shown is the estimated noise map from the reconstructed images[6]. We see a 30% increase in SNR compared to the LS-SRR method.

Figure 4 displays the qualitative results of our approach in comparison to two baseline methods, Bicubic and LS-SSR, for b=0 images. As can be seen, our method notably reduces motion artifacts when compared to the LS-SRR algorithm and shows better SNR.

Experiment 2: SRR Results with undersampled data

Figure 5 exhibits the results of our method when using fewer numbers of views for SRR. We utilized images reconstructed from 8 views as the ground truth, and the error maps are shown in the second and fourth rows. The outcomes indicate that using 6 views for reconstruction resulted in smaller errors, while using 4 views for reconstruction led to reduced sharpness due to the limited complementary information.

Conclusion

In summary, we used implicit neural representations to enable SRR without the need for training datasets. Compared to the existing SRR algorithms, our approach can reconstruct high-quality images and exhibits stronger motion robustness and noise reduction capabilities (30%), while further reducing scan time.

Acknowledgements

No acknowledgement found.

References

Setsompop, Kawin, et al. "Generalized SLIce Dithered Enhanced Resolution Simultaneous MultiSlice (gSlider-SMS) to increase volume encoding, SNR and partition profile fidelity in high-resolution diffusion imaging." Proceedings of the 24th Annual Meeting of ISMRM, Singapore. 2016.
Vis, Geraline, Markus Nilsson, Carl-Fredrik Westin, and Filip Szczepankiewicz. "Accuracy and precision in super-resolution MRI: Enabling spherical tensor diffusion encoding at ultra-high b-values and high resolution." NeuroImage 245 (2021): 118673.
Zijing Dong, J Polimeni, L. Wald, and F Wang, “SuperRes-EPTI: in-vivo mesoscale distortion-free dMRI at 500μm-isotropic resolution using short-TE EPTI with rotating-view super resolution”, ISMRM 2023
Ning, Lipeng, et al. "A joint compressed-sensing and super-resolution approach for very high-resolution diffusion imaging." NeuroImage 125 (2016): 386-400.
Sitzmann, Vincent, et al. "Implicit neural representations with periodic activation functions." Advances in neural information processing systems 33 (2020): 7462-7473.
Aja-Fernández S, Pieciak T, Vegas-Sánchez-Ferrero G. Spatially variant noise estimation in MRI: a homomorphic approach. Med Image Anal. 2015 Feb;20(1):184-97. doi: 10.1016/j.media.2014.11.005. Epub 2014 Nov 24. PMID: 25499191.

Figures

Figure 1. Strategy for (a) data acquisition and (b) super-resolution reconstruction. (a): data sampling of 8 LR images. All volumes were acquired with a different slice orientation. (b): A neural network is used to learn the implicit neural representation of the super-resolution image. The input to the network is the RAS spatial coordinates for any continuous points in the image field, while the output is the corresponding pixel values.

Figure 2. An overview of our implicit neural representation for super-resolution reconstruction. (a) Coordinate Preprocessing: The input is the matrix coordinates of LR images from each view and the outputs are the specific image values. (b) Training: Given a coordinate [r, a, s], the feature embedding layer encodes it into feature maps. Then, the INR decodes it to produce the pixel intensity. The Subvoxel Mapping block downsamples the output of INR to fit the LR images in each view with an MSE loss.

Figure 3. Qualitative results of our approach on the b=1000 s/mm² DWIs with 3 directions. Our method performed the best according to the image details and sharpness in both (a) sagittal and (b) coronal view. In the fourth row, we show the estimated noise maps for LS-SRR and our method. We see ~30% lower noise than LS-SRR.

Figure 4. Qualitative results of our approach and the two baseline methods (Bicubic and LS-SSR) on the b=0 dataset. The results show that our approach performed the best in noise suppression, and reduced the motion artifacts compared to the LS-SRR algorithm.

Figure 5. SRR results for reduced number of views. The images reconstructed by 8 views were used as a reference. Difference maps in the second and fourth rows show small errors for 6 views, but minor losses in sharpness for data reconstructed from 4 views.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

1076

DOI: https://doi.org/10.58530/2024/1076