0657

Dual-confidence-guided feature learning for semi-supervised medical image segmentation

Yudan Zhou¹, Shuhui Cai¹, Congbo Cai¹, Liangjie Lin², and Zhong Chen¹
¹Xiamen University, Xiamen, China, ²MSC Clinical & Technical Solutions, Philips Healthcare, China

Synopsis

Keywords: Diagnosis/Prediction, Machine Learning/Artificial Intelligence, Data Processing, MRI medical segmentation, Brain

Motivation: Obtaining a large medical image dataset with accurate annotations is challenging, thus limiting the practical application of deep learning in clinical practice.

Goal(s): Developing a novel semi-supervised algorithm for a limited set of labeled images.

Approach: Building a dual-branch network with dual-confidence-guided constraints for tumor feature learning, enabling the model to learn accurate and comprehensive feature representations.

Results: In brain tumor segmentation, this algorithm achieved accurate tumor boundary segmentation using only 1% and 10% of labeled training data, and obtained segmentation results close to fully supervised learning when 20% of the training data was labeled.

Impact: Our dual-confidence-guided semi-supervised feature learning model can achieve accurate brain tumor region segmentation with limited labeled training data, speeding up the application of deep learning technology in clinical research and providing assistance for clinical diagnosis.

Introduction

Currently, semi-supervised medical image segmentation algorithms primarily utilize a single convolutional neural network (CNN) model for feature extraction.^1-4Nonetheless, this approach impedes the extraction of global image information. Moreover, current training methods frequently neglect the comprehension of feature relationships between positive and negative samples, leading to challenges in accurately delineating multiple lesion boundaries. To address these issues, this research proposes a dual-confidence-guided feature learning model based on CNN-Transformer model. This approach aims to achieve accurate segmentation of multiple lesions in brain tumor magnetic resonance images.

Methods

Model: The proposed model is depicted in Figure 1. Our training set contained N labeled data and M unlabeled data (N<<M). For labeled data, a segmentation loss was defined as $$$L_{seg}=L_{ce}^{12}+L_{dice}^{12}$$$ , which comprised both dice loss and cross-entropy loss to facilitate supervised feature learning. For unlabeled data, as shown in Figure 2, a dual confidence aware module (DCAM) was designed. This DCAM estimated uncertainty from the predicted maps produced by the dual-branch network and defined a dual confidence aware loss $$L_{dca}=\frac{\sum_j\xi((U_1<T)\cap(U_2<T))\ \lvert\rvert{p_j^1-p_j^2}\lvert\rvert^2}{\sum_j\xi((U_1<T)\cap(U_2<T)) + 1e^{-16}}\quad,$$ where U₁ is the estimated uncertainty of network 1 at the j-th pixel,⁵ ξ(·) is the indicator function, p is the predicted segmentation map, T is a threshold. $$$L_{dca}$$$ imposed a reliable unsupervised consistency constraint on the predicted maps using mean square error loss. Additionally, to further enhance the network's discriminative capability for multi-class tumor features, we designed a projection head composed of a multi-layer perceptron. This projection head extracted the dual-confidence predicted maps perceived by the DCAM into a high-dimensional class vector space, enabling pixel-level inter-class feature contrastive learning constraint. It was defined as $$L_{icl}=-log\frac{exp(sim(f_i^u,f_i^l)/\tau)}{\sum_{j=0}^Cexp(sim(f_j^u,f_i^l)/\tau)}\quad,$$ where i, j belong to the class C, and i$$$\neq$$$j, $$$\tau$$$ is the temperature constant, sim(·) denote the cosine similarity, $$$f^u$$$ is the feature vector of unlabeled data and $$$f^l$$$ is the feature vector of labeled data. $$$L_{icl}$$$ encouraged the network to improve its discriminative ability among multiple tumor regions by bringing similar feature distances closer and pushing different class feature distances farther apart. The total loss function was denoted as $$$L_{total}=L_{seg}$$$ + λ₁$$$L_{dca}$$$ + λ₂$$$L_{icl}$$$, Where λ₁, λ₂ are hyperparameters that balance each term. In our experiment, we set λ₁=1. λ₂ is a time-dependent Gaussian warming up function⁶ to control the balance between the supervised loss and unsupervised consistency loss.
Dataset: We demonstrated our approach on BraTS 2020 dataset,⁷ it contains brain MR images of four different MRI modalities from 369 patients. We randomly divided the dataset into training (300), validation (39) and testing (30). All the images contained the following three segmentation labels: Gd-enhancing tumor (ET), peritumoral edema (ED), necrotic and non-enhancing tumor (NCR/NET). In the inference stage, we further divided the labels into three evaluation regions: ET, tumor core (TC, ET+NCR/NET), and whole tumor (WT, ET+ED+NCR/NET).
Evaluation metrics: We used Dice similarity coefficient (DSC), Jaccard, the average surface distance (ASD), and 95% Hausdorff distance (95HD) to quantitatively evaluate the performance of our model.

Results

Table 1 displays quantitative comparison results of our method with other state-of-the-art semi-supervised segmentation algorithms. Through a comprehensive evaluation of the segmentation results in the regions of interest (WT, TC, and ET) with 1%, 10%, and 20% labeled training data, the following conclusions can be drawn: Firstly, under the same labeled training data conditions, all semi-supervised algorithms outperform the supervised baseline, highlighting the significant potential of the semi-supervised learning paradigm in fully harnessing the target features from unlabeled data. Secondly, our method achieves state-of-the-art results on almost all metrics, particularly when 20% of the training data is labeled. In this scenario, our method achieves a DSC of 81.14% for the TC and 81.09% for the ET, which is nearly on par with the fully supervised baseline method with 100% labeling. Figure 3 shows visualizations of the tumor segmentation results from the compared algorithms. By comparing them with the ground truth (GT), it becomes evident that our method provides more accurate segmentation of different tumor boundaries.

Discussion and conclusion

We have proposed an advanced semi-supervised medical image segmentation method, where the dual-confidence assessment results from the branch networks are further utilized for pixel-level multi-class feature contrastive learning. This training strategy effectively enhances the accuracy of medical image diagnostics under data class imbalance conditions. This model requires only a small amount of labeled data to obtain accurate lesion segmentation contours, making it highly promising for clinical-assisted diagnosis. Future work will involve further optimization and evaluation of the model on locally collected hospital data.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under grant numbers 12375291, 82071913 and 22161142024.

References

1. Yu L, Wang S, Li X, et al. Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: Proceedings of the MICCAI. 2019; 605-613.
2. Chen X, Yuan Y, Zeng G, et al. Semi-supervised semantic segmentation with cross pseudo supervision. In: Proceedings of the CVPR. 2021; 2613-2622.
3. Verma V, Kawaguchi K, Lamb A, et al. Interpolation consistency training for semi-supervised learning. Neural Networks. 2022; 145: 90-106.
4. Luo X, Wang G, Liao W, et al. Semi-supervised medical image segmentation via uncertainty rectified pyramid consistency. Med Image Anal. 2022; 80: 102517.
5. Kendall A, Gal Y. What uncertainties do we need in Bayesian deep learning for computer vision? In: Proceedings of the NIPS. 2017; 5574-5584.
6. Yu L, Wang S, Li X, et al. Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: Proceedings of the MICCAI. 2019; 605-613.
7. Menze B H, Jakab A, Bauer S, et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging. 2014; 34(10): 1993-2024.
8. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of the MICCAI. 2015; 234-241.
9. Cao H, Wang Y, Chen J, et al. Swin-Unet: Unet-like pure transformer for medical image segmentation. In: Proceedings of the ECCV. 2022: 205-218.
10. Luo X, Hu M, Song T, et al. Semi-supervised medical image segmentation via cross teaching between CNN and transformer. In: Proceedings of the MIDL. 2022; 820-833.

Figures

Figure 1. The pipeline of our semi-supervised dual-confidence learning (Semi-DCL) segmentation framework using multi-modal data. The framework consists of supervised segmentation loss (L_seg), dual confidence aware loss (L_dca), and inter-class contrastive learning loss (L_icl). The details of the DCAM are shown in Figure 2.

Figure 2. The proposed dual-confidence-aware model (DCAM).

Table 1. The results of our method, fully-supervised baseline and the state-of-the-art semi-supervised segmentation methods on the BraTS 2020 dataset with 1%, 10%, and 20% labeled data in terms of DSC, Jaccard, 95HD and ASD.

Figure 3. Visual segmentation results for different methods with 20% labeled training data, from left to right: T₂-weighted (T₂w), CNN-Trans¹⁰, UA-MT¹, CPS², ICT³, URPC⁴, our method, and ground truth (GT), where significant segmentation differences (red arrows or circles), necrotic and non-enhancing tumor core (green), enhancing tumor (yellow), and peritumoral edema (blue) are indicated.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

0657

DOI: https://doi.org/10.58530/2024/0657