Shaoze Zhang1, Yiwei Liu1, Xingyue Wei1, Rui Wang1, Ziwei Liang2, Jianwen Luo1, and Zuo-Xiang He2
1Tsinghua University, Beijing, China, 2Beijing Tsinghua Changgung Hospital, Beijing, China
Synopsis
Keywords: Analysis/Processing, PET/MR, Multi-model Registration, Dual Attention Mechanism
Motivation: Registered PET-MRI is better than single modality in diagnoses, and traditional algorithms are time-consuming and perform poorly in cross-modal registration.
Goal(s): Improve registration efficiency and reduce registration time by improving traditional deep learning networks.
Approach: We propose a weakly-supervised PET-MRI registration network based on a hybrid adaptive attention mechanism. Masks extracted from the fine-tuned large model is uesd to constrain the network.
Results: We validate the proposed method on liver PET-MRI images. The experimental results show that the proposed method achieves a higher DICE value and shorter registration time than the other state-of-the-art registration algorithms.
Impact: Our proposed new network can help doctors to complete the registration between PET and MRI and diagnose a disease in a short period of time.
Introduction
PET as a non-invasive imaging technology, is extensively used for cancer detection and monitoring. MRI has a high resolution in soft tissue imaging. Registration of these two modalities improves doctors' diagnostic efficiency. Factors like patient respiration and variations in
data across different devices pose challenges in aligning corresponding regions. Traditional intensity-based registration methods perform poorly in cross-modal registration due to the different physical principles of these imaging modalities. VoxelMorph 1 is a representative end-to-end convolutional neural network (CNN) in the registration field, demonstrating immense potential in deep learning. Through joint optimization using image similarity losses, it aligns a moving image to a fixed image by optimizing a displacement field. However, limited receptive fields in CNN present challenges in extracting features from multimodal images for registration. To address these challenges, we propose a weakly-supervised PET-MRI registration network based on a hybrid adaptive attention mechanism and employ multi-scale residual structures to construct displacement fields. Masks extracted from the fine-tuned Segment Anything Model (SAM) 2 are used to constrain the network and guide efficient registration. We validate the proposed method on liver PET-MRI images. The experimental results show that the proposed method achieves a higher DICE value and shorter registration time than the other state-of-the-art registration algorithms.Methods
The
proposed PET-MRI image registration method yields a dense displacement field
between the two images. Our methodology was based on the Voxelmorph network
architecture, depicted in Fig 1. Initially, rough labels were acquired through
SAM, and these labels were used to provide boundary constraints on the region
of interest (ROI). We replaced the original convolutional layers with a hybrid
adaptive attention module (HAAM) 3, allowing a flexible selection of
different receptive field scales in channel and spatial dimensions. The channel
self-attention mechanism assists in selecting relevant features within a
broader receptive field, while the spatial self-attention mechanism aids in
identifying positional relationships among corresponding features. This
contributes to generating a more accurate deformation field.
Additionally, in the decoder stage, residual structures were introduced to fuse
multi-scale displacement field information. The overall loss function
consists of three parts: similarity between the moved images and the fixed
images, regularization, and boundary constraint. We employed mutual information
(MI) loss and the modality independent neighborhood descriptor (MIND) 4 loss to balance global registration
and local registration.
The
liver dataset, comprising 40 pairs of PET-MRI images and corresponding organ
labels, was obtained from Tsinghua Chang Gung Hospital. 35
subjects were utilized for training, while 5 subjects were allocated for validation. Prior to
training, all the images underwent a series of preprocessing steps.
Initially, PET images and MRI images were aligned using ITK-SNAP 5. Subsequently, the voxel size was resampled to 2.79×2.79×2.79 mm3. Following this, all the images
were cropped and zero-padded to 160×160×160. The network was
implemented in Python using the PyTorch framework on a Nvidia RTX A6000 GPU.
The model was optimized using the Adam optimizer with a learning rate of 10−4.
To
quantitatively evaluate the registration performance of our proposed method, we
compared our approach with ANTs 6, Affine, and VoxelMorph. The
evaluation metrics included the DICE score, non-positive Jacobian
determinants (|JΦ|≤0), and registration time.Results
Our experimental
results validate the registration accuracy between the PET images and MRI images. Fig. 2 presents the results obtained by different methods and visualization of the deformation fields. The experimental results in Fig. 3 show that the proposed method achieves a higher DICE value and shorter registration time than the other state-of-the-art registration algorithms.Conclusion
In
this work, we replaced the original convolutional kernels with modules
featuring a mixed adaptive channel spatial attention mechanism, allowing the neural network to have a more robust ability to extract
features. The mask extracted from the large model is used to guide the
network to obtain better registration results. The ultimate registration
performance significantly surpassed those of other state-of-the-art methods in terms of DICE score and registration time.Acknowledgements
No acknowledgement found.References
- G.
Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca, “VoxelMorph:
A Learning Framework for Deformable Medical Image Registration,” IEEE Trans.
Med. Imaging, vol. 38, no. 8, pp. 1788–1800, Aug. 2019, doi:
10.1109/TMI.2019.2897538.
- Kirillov A, Mintun E, Ravi N, et al. Segment anything[J]. arXiv preprint arXiv:2304.02643, 2023.
- G.
Chen, L. Li, Y. Dai, J. Zhang, and M. H. Yap, “AAU-Net: An Adaptive Attention
U-Net for Breast Lesions Segmentation in Ultrasound Images,” IEEE Trans.
Med. Imaging, vol. 42, no. 5, pp. 1289–1300, May 2023, doi:
10.1109/TMI.2022.3226268.
- M.
P. Heinrich et al., “MIND: Modality independent neighbourhood descriptor
for multi-modal deformable registration,” Medical Image Analysis, vol.
16, no. 7, pp. 1423–1435, Oct. 2012, doi: 10.1016/j.media.2012.05.008.
- P.
A. Yushkevich, Y. Gao, and G. Gerig, “ITK-SNAP: An interactive tool for
semi-automatic segmentation of multi-modality biomedical images,” in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBS), Orlando, FL, USA, pp. 3342–3345, Aug. 2016, doi: 10.1109/EMBC.2016.7591443.
- B.
B. Avants, N. J. Tustison, G. Song, P. A. Cook, A. Klein, and J. C. Gee, “A
reproducible evaluation of ANTs similarity metric performance in brain image
registration,” NeuroImage, vol. 54, no. 3, pp. 2033–2044, Feb. 2011,
doi: 10.1016/j.neuroimage.2010.09.025.