2950

Realizing a robust multi-task training strategy with deep learning: application in Spine MR image quality assessment

Deepa Anand¹, Dattesh Shanbhag¹, Chitresh Bhushan², and Uday Patil³
¹GE Healthcare, Bangalore, India, ²GE Healthcare, Niskayuna, NY, United States, ³GE Healthcaer, Bangalore, India

Synopsis

Keywords: Machine Learning/Artificial Intelligence, Machine Learning/Artificial Intelligence

Multi-task training is attractive in AI applications (from memory, processing time and potential data reduction), where tasks have commonality in terms of features, but still requires differentiation for individual outputs derived. In this work we present a methodology to implement robust multi-task learning framework considering various strategies (parallel, iterative and sequential). We tested the approach for image quality assessment of spine MRI localizer images. We demonstrate that the sequential training is the most effective, in preserving an accuracy above the acceptable level while allowing for a save in number of model parameters (50%).

Introduction

AI based image processing applications have gained popularity for past few years. Recently, multi-task-based methods have been proposed to take advantage of common features across the tasks which can result in compact model, reduce data requirements and inferencing time. The are different methods to realize the multi-task-based model in practice. In this work, we investigated three such methods (Parallel where tasks are trained simultaneously, Iterative where tasks are trained one after another and Sequential) for spine localizer image quality assessment for intelligent scan plane prescription (ISP) [1,2].
An important component of ISP is the image quality assessment module – called the LocalizerIQ module. This module enables decisions on input volume quality and may help in filtering out volumes/image regions which do not meet the quality criteria from being processed by modules further down the pipeline [3]. Compared to brain and knee anatomies, adapting a similar framework for spine is both complicated and simplified since:
a. Spine contains multiple stations with overlapping stations and curvature differences (e.g., cervical, cervico-thoracic etc.),
b. but information is shared in terms of individual vertebrae /disc appearance.
Accordingly, we devised a coherent single model mechanism to merge the outputs in a comprehensive way for image quality assessment as three tasks:
· Task 1: Station classifier – for classifying image volumes into stations
· Task 2: Coverage Classifier – component indicating if a Cervical or Lumbar has sufficient coverage in terms of the number of vertebra visible.
· Task 3: Slice Classifier – For each slice in a volume indication of if vertebra and/or disc are sufficiently visible for further processing. This component is like the localizerIQ function for other anatomies i.e. Brain and knee.[1,2]

Note that for the first two tasks the decision needs to be taken at a volume level, while for the final task, the output is provided per slice basis.

We present a DL based multi-task 2D framework (Fig. 2), for the three tasks above as opposed to having separate models which would require more memory/processing. Since the framework is 2D based, the 3D image context information is reduced to a 2D representation for the first two tasks, so that it can be processed by the common 2D DL network.

Methods

Data: The data for the study came from 45 clinical sites, comprising data from multiple MRI scanners (1.5T, 3.0 T), wide variety of demographics, multiple stations and receive coils (Fig. 1). Single-shot fast spin echo (SSFSE) based three plane localizers were included in the study with only sagittal and coronal localizers considered.
Gold Standard: Three models trained independently for each of the three tasks (#1 to #3) were considered as gold standard to compare accuracy with a combined model.
We experimented with parallel, iterative, and sequential schemes for the combined training for the three tasks (detailed in Fig. 3). The model architecture was kept the same across these different models.
Scheme A: In the parallel scheme (Scheme A) all the tasks are trained simultaneously with a combined loss function to minimize loss for all the tasks.
Scheme B - The network is trained for each task for a few epochs one after the other, iteratively.
Scheme C – Sequential – We train the network including the backbone and the network head for Task 1 completely for Task 1. Following this, the network backbone is frozen and only the head for task 2 is trained completely for the corresponding task. Finally with the same frozen backbone, the network is trained for Task 3 with the corresponding network head.
Assessment: A model with accuracy > 90% was considered as acceptable..

Results

For Scheme A: We find that such a training scheme results in sub-optimal performance of around 60-70-73% accuracy for all the tasks quickly stagnating and not improving beyond a particular level. Though the data is the common element in these tasks – the task requirements themselves may be too diverse for them to result in any meaningful network training.For scheme B: We find that the catastrophic forgetting, characteristic of DL networks, comes into play even if each task is only trained for one epoch at a time. We observe that each time the network starts the training occurs as if for the first time and there is lack of progressive improvement in the loss as the training progresses with an accuracy of 65% for station classification and ~68% for the other two tasks.Scheme C: We find that this kind of training results in a meaningful learning for network parameters and gives reasonable results for each of the tasks. Fig. 4 presents the results from our experiments. As compared to models trained independently – we observe that for Task 2 & 3 the performance is impacted by ~ 2%, while the overall performance is above acceptable criteria. Thus, the combined training results in reasonable results for all three tasks with accuracy > 90% while sharing parameters, resulting in reduction in model size (~ 50%) and consequent inference time on CPU

Acknowledgements

No acknowledgement found.

References

[1] A generalized deep learning framework for multi-landmark intelligent slice placement using standard tri-planar 2D localizers, ISMRM 2019

[2] Intelligent Knee MRI slice placement by adapting a generalized deep learning framework, ISMRM 2020

[3] Intelligent Scanning Using Deep Learning for MRI, https://blog.tensorflow.org/2019/03/intelligent-scanning-using-deep-learning.html

Figures

Figure 1: Distribution of Patient age, Field Strength and Spine Station in the data pool

Figure 2: Image Quality classifier - posed as a multi-task problem; Here a common backbone for feature extraction is used; for each task a small separate head is used for each task at hand

Figure 3: Possible training schemes used for multi-task training; Scheme A-Parallel - train for all tasks simultaneously; Scheme B - Train tasks one after the other iteratively; Scheme C - sequential- train for Task1 completely - freeze backbone and train for Task 2 & 3 one after the other

Figure 4: Comparison of training using multi-task network vs. individual networks. The performance remains almost at par with independent models while saving on number of parameters.

Proc. Intl. Soc. Mag. Reson. Med. 31 (2023)

2950

DOI: https://doi.org/10.58530/2023/2950