4714

SSTN: Self-supervised Triple-Network with Multi-scale Dilated-Convolution for Fast MRI Reconstruction
Yuekai Sun1, Jun Lv1, Weibo Chen2, He Wang3,4, and Chengyan Wang4
1School of Computer and Control Engineering, Yantai University, Yantai, China, 2Philips Healthcare, Shanghai, China, 3Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China, 4Human Phenome Institute, Fudan University, Shanghai, China

Synopsis

To solve the problem that fully-sampled reference data is difficult to acquire, we proposed a self-supervised triple-network (SSTN) for fast MRI reconstruction. Each pipeline of SSTN is composed of multiple parallel ISTA-Net blocks which consists of different scales dilated convolution layers. The results demonstrated that the proposed SSTN can generate better quality reconstructions than competing methods at high acceleration rates.

Introduction

Magnetic resonance imaging (MRI) is a widely used imaging modality in clinical diagnosis. However, its long scanning time will lead to motion artifacts. Under-sampling in k-space is a common solution, but aliasing artifacts will appear in the reconstructed images. Recently, many deep learning-based approaches have been proposed to solve this problem. However, most of these methods rely on a large number of fully-sampled MRI to train the network. In fact, the fully-sampled MR images are difficult to obtain since it is time-consuming. To solve this issue, Yamen et al.1 developed a framework using self-supervised learning via data undersampling (SSDU) for training a network without fully-sampled datasets. Hu et al.2 proposed a parallel self-supervised learning framework. However, acquired under-sampled measurements are still not sufficiently utilized. Thus, we proposed a self-supervised triple network framework (SSTN) with multi-scale dilated-convolution to solve the above question.

Method

The framework of SSTN is illustrated in Figure 1. It consists of three major parts: 1) The same under-sampling mask and different selection masks; 2) three parallel networks consisting of multiple UISTA Block individually; 3) the output results of each parallel branch and the difference loss is calculated between the output result of the three networks
Mathematical Model
Overall, the problem solved during the training phase can be written in a mathematical formula as:
$$\begin{gathered}\arg \min _{\mathrm{x}_{1}, \mathrm{x}_{2}, \mathrm{x}_{3}} \frac{1}{2}\left(\left\|A_{1} x_{1}-y_{1}\right\|_{2}^{2}+\left\|A_{2} x_{2}-y_{2}\right\|_{2}^{2}+\left\|A_{3} x_{3}-y_{3}\right\|_{2}^{2}\right)+\lambda R\left(x_{1}\right)+\mu R\left(x_{2}\right)+v R\left(x_{3}\right) \\+\xi L\left(x_{1}, x_{2}, x_{3}\right)\end{gathered}$$
where $$$A$$$ denotes the encoding matrix obtained from the dot product of the sampling matrix $$$S$$$ and the Fourier transform $$$F$$$, $$$y$$$ denotes the image output by selection mask, $$$L$$$ denotes similarity metrics, $$$x_{1}, x_{2}$$$ and $$$x_{3}$$$ are the three reconstruction results from three parallel network. $$$R$$$ represents $$$L1$$$ regularization, and $$$\lambda, \mu, v, \xi$$$ are regularization parameters.
Mask Details
Selection mask and under-sampling mask are employed. The under-sampling mask is used to generate under-sampled images. The selection mask picks random points from the under-sampled data to generate the image as the input reconstruction network. The masks we used in study are shown in Figure 2.
UISTA Block
As shown in Figure1(a) the UISTA Block consists of three parallel unrolled versions of the ISTA-Net3. The front end of each net is connected to a multi-scale dilated convolutional layer in this block, and a traditional convolutional layer is at the end of this block. The optimization of the image reconstruction problem is solved by iterating the following two steps in UISTA Block:
$$\begin{aligned}\mathbf{r}^{(k)} &=\mathbf{x}^{(k-1)}-\rho \mathbf{A}^{\top}\left(\mathbf{A} \mathbf{x}^{(k-1)}-\mathbf{y}\right) \\\mathbf{x}^{(k)} &=\widetilde{\mathcal{H}}\left(\operatorname{soft}\left(\mathcal{H}\left(\mathbf{r}^{(k)}\right), \theta\right)\right)\end{aligned}$$
where $$$n$$$ is the serial number of ISTA-Net in the UISTA block, $$$\rho$$$ is the step size, $$$k$$$ represents the iteration index, $$$\mathcal{H}$$$ and $$$\widetilde{\mathcal{H}}$$$ denotes a combination of two linear convolutional operators and a rectified linear unit (ReLU).
Loss Function Design
The loss function in this paper is defined as:
$$\begin{aligned}L(\Theta)=\frac{1}{N}\left(\sum_{i=1}^{N}\right.&\left(L\left(y^{i}, A^{i} x_{1}^{i}\right)+L\left(y^{i}, A^{i} x_{2}^{i}\right)+L\left(y^{i}, A^{i} x_{3}^{i}\right)\right) \\&\left.+\alpha \sum_{i=1}^{N}\left(L\left(A^{\prime i} x_{1}^{i}, A^{i \prime} x_{2}^{i}\right)+L\left(A^{\prime i} x_{1}^{i}, A^{i \prime} x_{3}^{i}\right)\right)+\beta L_{cons1}+\gamma L_{cons2}+\delta L_{cons3}\right)\end{aligned}$$
where $$$N$$$ is the number of the training cases, $$$i$$$ is the current training case, $$$L$$$ refers to the MSE loss, $$$A$$$ and $$$A^{\prime}$$$ respectively represent the scanned and unscanned k-space data points, $$$L_{cons}$$$ represents constraint loss.

Dataset

Public Campinas brain MR raw data were used for training. Randomly selected 40 subjects for network training and 2 subjects for testing, corresponding to 2800 and 140 2D images, respectively. The matrix size of each image is 256×256.

Experimental Settings and Evaluation

The proposed SSTN was tested on data with 2D random under-sampling under 4x and 8x acceleration factors. We evaluated the performance of the SSTN against previously proposed Parallel-Network2 and ablation experiments were performed. We trained the networks with the following hyperparameters: $$$\alpha, \beta, \gamma, \delta$$$=0.01 for SSTN reconstruction. The model used a batch size of 2 and the initial learning rate of 10-4 for training. Experiments were carried out a system equipped with GPUs of NVIDIA Quadro GP100 (with a memory of 16GB). We evaluated the reconstruction results quantitatively in terms of PSNR and SSIM. A paired Wilcoxon signed-rank test was conducted to compare the PSNR and SSIM measurements between different approaches. p < 0.05 was treated as statistically significant.

Results

As shown in Figure 3, the ZF reconstruction was remarkably blurred. Moreover, the Parallel Network reconstructed images contained substantial residual artifacts, which can be seen in the error maps. However, the SSTN results had the least error and were capable of removing the aliasing artifacts. Correspondingly, the proposed SSTN method also performed the best in terms of PSNR and SSIM metrics. These observations have a good correlation with the numerical analysis shown in Table 1.
Meanwhile, the results in Table 2 verified that the multi-scale dilated convolution can extract features more efficiently and improve the reconstruction quality. Furthermore, the triple network structure in SSTN will constrain the reconstruction results of each network more efficiently.

Conclusion

The proposed SSTN can generate better quality reconstructions than competing methods at high acceleration rates. Thus, the proposed SSTN is promising for under-sampled MRI reconstruction in many clinical applications, where no fully-sampled data are available for complete training.

Acknowledgements

This work was supported by the National Natural Science Foundation of China ( No. 61902338, No. 62001120), the Shanghai Sailing Program (No. 20YF1402400).

References

[1] Yaman, B., Hosseini, S.A.H., Moeller, S., Ellermann, J., U˘gurbil, K., Ak¸cakaya,M.: Self-supervised learning of physics-guided reconstruction neural networks without fully sampled reference data. Magn. Resonan. Med. 84(6), 3172–3191 (2020). https://doi.org/10.1002/mrm.28378

[2] Hu, Chen, et al. "Self-Supervised Learning for MRI Reconstruction with a Parallel Network Training Framework." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021.

[3] Zhang, J., Ghanem, B.: ISTA-Net: interpretable optimization-inspired deep network for image compressive sensing. In: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR), June 2018.

Figures

Figure 1.The architecture of the proposed SSTN for fast MRI reconstruction. (a) The United Iterative Systolic Thresholding Algorithm Block, in which the dilated rates of the three dilated convolutions are 1, 2 and 3, respectively. (b) The architecture of ISTA-Net.

Figure 2.Mask maps. From left to right are undersampling mask, up mask, middle mask and down mask. The "up","middle" and "down" denotes three parallel networks and the sampling factor is 2, 1.99, 2.5, respectively.

Figure 3. Typical reconstruction results of different methods at two acceleration rates (4 and 8) and their corresponding difference map. PSNR and SSIM are indicated on the top-left.

Table 1. Quantitative evaluations for different methods at two acceleration rates (4 and 8). For the average, we use * to indicate if difference were statistically significant (p < 0.05) compared to Parallel-Network.

Table 2. Quantitative evaluations of ablation study at two acceleration rates (4 and 8). "double" means two parallel networks with multi-scale dilated-convolution, "single scale" means triple parallel networks without multi-scale dilated-convolution, "triple" means three parallel networks with multi-scale dilated-convolution.

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)
4714
DOI: https://doi.org/10.58530/2022/4714