Multi-Task Accelerated MR Reconstruction Schemes for Jointly Training Multiple Contrasts
Victoria Liu1, Kanghyun Ryu2, Cagan Alkan2, John Pauly2, and Shreyas Vasanawala2
1California Institute of Technology, Pasadena, CA, United States, 2Stanford University, Stanford, CA, United States


Model-based accelerated MRI reconstruction networks leverage large datasets to reconstruct diagnostic-quality images from undersampled k-space. To deal with inherent dataset variability, the current paradigm trains separate models for each dataset. This is a demanding process and cannot exploit information that may be shared amongst datasets. In response, we propose multi-task learning (MTL) schemes that jointly reconstruct multiple datasets. Introducing inductive biases to the network allows for positive information sharing. We test MTL architectures and weighted loss functions against single task learning (STL). Our results suggest that MTL can outperform STL across a range of dataset ratios for two knee contrasts.


To reduce MRI scan time, various iterative reconstruction schemes have been investigated.1-4 Recently, deep learning approaches, which train a network to estimate the reconstructed image using retrospectively undersampled k-space, have shown superior efficacy over previous non-network based methods.5 However, these networks require sufficient collection of fully sampled k-space data from similar acquisition protocols as the test-time inference data.6 For example, to train multiple contrasts, the current paradigm is to collect multiple, fully sampled k-space data for each contrast and train each contrast-specific network separately to avoid domain shift.6-7 Considering the exceptionally large variability of MR images (i.e. different contrasts, orientations, anatomies, pulse sequences), separate training requires a large effort and limits the additional information that can be gained from multiple datasets.
To address this barrier, we propose a novel multi-task learning (MTL) scheme that can jointly train a single network on a variety of datasets. MTL has recently gained traction in various areas,8–15 but has yet to be applied to MRI reconstruction. Our study investigates how this scheme can be useful for jointly training diverse datasets. The scheme jointly trains various fully sampled k-space datasets by treating them as different tasks within the same network. The network can train multiple tasks simultaneously and exploit shared, common features to prevent individual tasks from overfitting, thereby fostering better performance compared to conventional STL counterparts.


The baseline STL network is a typical unrolled network composed of a series of unrolled blocks.16 The MTL network structure is composed of shared layers and task-specific layers, each consisting of multiple unrolled blocks inside the layer. Two architectures are used in the study: split and multi-head (Figure 1). Both networks start with two shared blocks before splitting into ten task-specific blocks. In the split architecture, the task-specific blocks do not share further information (1a). In the multi-head architecture, the U-Net encoder continues to be shared amongst tasks, but the decoder is task-specific (1b).
We consider three different loss weighting schemes. Naive weighting addresses the data imbalance in the loss function by weighting individual task losses in an inverse relation to dataset size. Uncertainty weighting9 treats the multi-task network as a probabilistic model and incorporates the homoscedastic uncertainty in the loss function. Dynamic weight averaging (DWA)14 assigns task losses based on the learning speed of each task.

Training and Inference

We use two public knee datasets16 that are available at mridata.org.17 The datasets contain 19 coronal proton density weighted (PDw) and 20 coronal proton density weighted fat suppression (PDw-FS) knee scans. In our experiments, PDw simulates the abundant dataset by using all 481 slices, and PDw-FS simulates the scarce dataset by using a percentage of the 497 slices.
Our models are implemented in PyTorch and trained on NVIDIA Titan Xp GPUs with 12GB of memory. For the experiments, networks with 12 unrolled blocks are used to ensure convergence. To assess image quality, magnitude images are normalized between 0 and 1, and peak signal-to-noise ratio (pSNR), structural similarity index (SSIM), and normalized root mean square error (nRMSE) are used. During inference, k-spaces in the test-set are undersampled identically to guarantee fair comparisons.
For this study, we mix and match the network architectures (split and multi-head) with the loss functions (naive, DWA, uncertainty) for a total of six MTL networks (Tables 2-3). MTL networks are jointly trained using 481 PDw slices and a percentage of PDw-FS slices. We also provide comparisons with transfer learning by taking the PDw baseline and fine-tuning all layers using PDw-FS data.6


As seen in Table 1, an MTL network performs better than STL at every dataset ratio. At N = 107 and N = 253, the naive-weighted, split architecture dominates the other MTL architectures (Tables 2-3). Interestingly, the MTL network not only improves metrics for PDw-FS, but also for PDw at certain ratios (Tables 2-3). This suggests that there is positive information transfer from the scarce dataset to the abundant dataset as well.
Qualitative examination also suggests that MTL reduces errors in reconstruction (Figure 2) at both ratios. A comparison between STL and MTL is seen in Figure 2 for two different inference slices. Interestingly, transfer learning performs better than MTL quantitatively (Tables 1-3).


Our framework introduces inductive biases in the network by enforcing the sharing of useful information between tasks. We see that MTL performs better than STL baselines across a range of abundant versus scarce ratios, for both PDw and PDw-FS datasets. Our finding that transfer learning marginally outperforms MTL suggests that MTL may be better suited for more dissimilar tasks such as different orientations, anatomies rather than multi-contrasts. Moreover, it is possible that once we use larger datasets, STL will dominate transfer learning, and MTL’s main competitor will be STL.


One noted difficulty is selecting an appropriate architecture and loss function, since negative transfer between the dataset can occur (Tables 2-3). All in all, our study provides a proof of concept that MTL can be successfully used in jointly training multiple contrasts for MRI reconstruction.


This project is supported by NIH R01 EB009690 and NIH R01 EB029427, as well as GE Healthcare and the Marcella Bonsall Fellowship.


Figure 1. Multi-task learning network architectures built on an unrolled variational network for (a) fully split and (b) shared-encoder-split-decoder blocks. Note that the diagram is simplified. As described in the text, there are two shared blocks and ten task-specific blocks for both MTL architectures.

Figure 2. Representative reconstructions for MTL versus STL networks at (a) N = 107 and (b) N = 253 for PDw-FS test dataset. The MTL network is a naive-weighted, split network in (a) and a naive-weighted, multi-head network in (b). The arrows point to aliasing artifacts present in STL.

Table 1. Comparison of STL, MTL, and transfer learning for PDw-FS reconstruction

Table 2. SSIM of PDw-FS test dataset for different MTL networks

Table 3. SSIM of PDw test dataset for different MTL networks

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)
DOI: https://doi.org/10.58530/2022/4053