1980

Deep Learning-Based Multistep Deformable Medical Image Registration for Multimodal Minimal-Invasive Image-Guided Intervention

Anika Strittmatter^1,2, Lothar R. Schad^1,2, and Frank G. Zöllner^1,2
¹Computer Assisted Clinical Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany, ²Mannheim Institute for Intelligent Systems in Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany

Synopsis

Keywords: Analysis/Processing, Machine Learning/Artificial Intelligence, Image Registration, Multimodal, Minimal-Invasive, Image-Guided Intervention

Motivation: We developed neural networks for deformable medical image registration using multiple steps and resolutions.

Goal(s): To investigate how multiresolution networks impact registration results compared to monostep-monoresolution networks.

Approach: The networks were trained unsupervised with Mutual Information and Gradient L2 loss. We compared them with a monoresolution-monostep network and the classical registration method SimpleElastix. We evaluated the multistep networks using a three-dimensional liver dataset with CT and T1-weighted MR scans.

Results: Incorporating multiple steps and resolutions in the neural network yielded registration results with high spatial alignment and medically plausible transformations (minimal image folding) and fast registration times of less than half a second.

Impact: Since the inclusion of multiple steps and resolutions within the neural network leads to improved registration results, multistep registration methods should be used whenever possible. Consequently, more work should be invested in developing multistep-multiresolution networks for multimodal medical image registration.

Introduction

Patients with oligometastatic disease often face poor survival rates due to genetic and molecular variations in tumor cells that are resistant to standard treatments. Tailored techniques, such as minimal invasive image-guide interventions, take into account tumor heterogeneity and significantly improve survival rates^1,2. These methods combine magnetic resonance (MR) imaging and computed tomography (CT) for better lesion visibility compared to CT alone³. However, variations in patient positioning and respiration between different scanners lead to organ distortions, requiring image registration before fusion.
Recently, several medical image registration methods have been published that apply multiple steps and resolutions, such as SimpleElastix⁴ (four resolutions) and NiftyReg^5,6 (three resolutions). Some neural networks also use different resolutions in a coarse-to-fine approach^7-9. However, no study has investigated how multiresolution networks impact registration results compared to a monostep-monoresolution network.
Therefore, we implemented multistep networks with different resolutions and compared their performance with each other and a monostep-monoresolution network.

Methods

Our dataset consists of 73 MR T1-weighted scans and 47 CT scans, aquired as part of the "Mannheim Molecular Intervention Environment" (M²OLIE) study (Table 1). For both modalities, liver segmentations are available, which were created by neural networks.
In our experiments, we registered the MR scans to the CT scans.
Network Architectures
We developed three neural network variants for multistep deformable registration (Figure 1). In multistep network variant A, four steps/resolutions are applied, in variant B three and in variant C two steps/resolutions are used. As subnetworks, an adaption of VoxelMorph-2^14,15 is used (features [[16, 32, 32], [32, 32, 32, 16, 16]]).
Benchmark Neural Network and Baseline Method
As a benchmark, a monostep network is applied (Figure 1d). As baseline, we used a conventional deformable registration with SimpleElastix⁴ with default parameters. The "FinalGridSpacingInPhysicalUnits" was set to ['100', '100', '100'] to achieve smooth and medically plausible transformations.
Training Setting
We trained all subnetworks of the multistep networks jointly. We applied Mutual Information (MI) of the fixed (F) and the moved (M) image and Gradient L2 loss to regularize the deformation field (Ф):
L(F,M) = L_MI(F,M) + λ L₂(Ф)
We performed five-fold cross-validation, training for 200 epochs with a batch size of 1, and early stopping if the validation loss didn't improve for five consecutive epochs. We saved the model weights with the highest validation accuracy. The image intensities were normalized to the range [0, 1]. To address GPU memory constraints, the inputs were resized to 256x256x64 voxels at 2x2x4 mm³ resolution. During inference, the network transformations were applied to the full-resolution images.
Our source code is available at https://github.com/Computer-Assisted-Clinical-Medicine/Multistep_Networks_for_Deformable_Medical_Image_Registration.
Evaluation Metrics
We evaluated the overlap of the liver segmentations of the fixed and moved image using Dice coefficient. And we evaluated the degree of image folding (anatomically impossible deformations) by calculating the number of Jacobian determinants ≤ 0 in the deformation field. Additionally, we measured the time needed to register a new image pair.

Results

We trained networks with the hyperparameters in Table 2. The baseline SimpleElastix significantly improved the Dice coefficient compared to the unregistered images (Table 3 and Figure 2). Multistep variant A yielded a similar Dice coefficient as the baseline. The Dice coefficient decreased with fewer steps. The lowest Dice coefficient was obtained with the benchmark (monostep-monoresolution network).
Registration with all methods lead to a low degree of image folding (Jacobian determinant ≤ 0). Registration with the baseline was significantly slower than with neural networks. Fewer neural network steps led to faster registration, taking under half a second for all neural networks.

Discussion

We implemented multistage networks with different resolutions and evaluated them with respect to a deformable multimodal registration of in-vivo CT and MR images of the liver. Furthermore, we implemented a framework to test and evaluate these networks systematically.
Training the multistep network with up to four subnetworks required a lot of GPU memory. If a smaller GPU or larger image input is used, there may not be enough GPU memory available for training the four-subnetwork variant. In this case, reducing the number of subnetworks is necessary.

Conclusion

We proposed neural networks for deformable multimodal medical image registration that use multiple steps and varying resolutions. Our results demonstrate that incorporating multiple steps and resolutions within the neural network framework leads to registration results with high structural similarity and minimal image folding, resulting in a medically plausible transformation, while maintaining a low registration time of less than half a second.

Acknowledgements

This research project is part of the Research Campus M²OLIE and funded by the German Federal Ministry of Education and Research (BMBF) within the Framework "Forschungscampus: public-private partnership for Innovations" under the funding code 13GW0388A.

This project was supported by the German Federal Ministry of Education and Research (BMBF) under the funding code 01KU2102, under the frame of ERA PerMed.

References

1. Qiu, H., Katz, A.W., Milano, M.T., 2016. Oligometastases to the liver: predicting outcomes based upon radiation sensitivity. J Thorac Dis (10):E1384-E1386. doi:10.21037/jtd.2016.10.88.

2. Ruers, T., Van Coevorden, F., Punt, C., Pierie, J., Borel-Rinkes, I., Ledermann, J., Poston, G., Bechstein, W., Lentz, M., Mauer, M., Folprecht, G., Van Cutsem, E., Ducreux, M., Nordlinger, B., 2017. Local Treatment of Unresectable Colorectal Liver Metastases: Results of a Randomized Phase II Trial. J Natl Cancer Inst 109(9):djx015. doi:10.1093/jnci/djx015.

3. Bauer, D.F., Rosenkranz, J., Golla, A.K., Tönnes, C., Hermann, I., Russ, T., Kabelitz, G., Rothfuss, A.J., Schad, L.R., Stallkamp, J.L., Zöllner, F.G., 2022. Development of an abdominal phantom for the validation of an oligometastatic disease diagnosis workflow. Medical Physics 49, 4445–4454. URL: https://aapm.onlinelibrary.wiley.com/doi/abs/10.1002/mp.15701

4. Marstal, K., Berendsen, F., Staring, M., Klein, S., 2016. SimpleElastix: A User-Friendly, Multi-lingual Library for Medical Image Registration, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 574–582. doi:10.1109/CVPRW.2016.78.

5. Modat, M., Ridgway, G.R., Taylor, Z.A., Lehmann, M., Barnes, J., Hawkes, D.J., Fox, N.C., Ourselin, S., 2010. Fast free-form deformation using graphics processing units. Computer Methods and Programs in Biomedicine 98, 278–284. URL: https://www.sciencedirect.com/science/article/pii/S0169260709002533, doi:https://doi.org/10.1016/j.cmpb.2009.09.002. hP-MICCAI 2008.

6. Modat, M., Cash, D., Daga, P., Winston, G., Duncan, J., Ourselin, S., 2014. Global image registration using a symmetric block-matching approach. Journal of medical imaging (Bellingham, Wash.) 1, 024003. doi:10.1117/1.JMI.1.2.024003

7. Hering, A., van Ginneken, B., Heldmann, S., 2019. mlVIRNET: Multilevel variational image registration network, in: Lecture Notes in Computer Science. Springer International Publishing, pp. 257–265. URL: https://doi.org/10.1007/978-3-030-32226-7_29, doi:10.1007/978-3-030-32226-7_29.

8. Sokooti, H., de Vos, B., Berendsen, F., Ghafoorian, M., Yousefi, S., Lelieveldt, B.P.F., Išgum, I., Staring, M., 2019. 3d convolutional neural networks image registration based on efficient supervised learning from artificial deformations. arXiv:1908.10235

9. de Vos, B.D., Berendsen, F.F., Viergever, M.A., Sokooti, H., Staring, M., Išgum, I., 2019. A deep learning framework for unsupervised affine and deformable image registration. Medical Image Analysis 52, 128–143. doi:10.1016/j.media.2018.11.010.

10. Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation, in: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (Eds.), Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Springer International Publishing, Cham. pp. 234–241.

11. Soler, L., Hostettler, A., Agnus, V., Charnoz, A., Fasquel, J., Moreau, J.,Osswald, A., Bouhadjar, M., Marescaux, J., 2010. 3D image reconstruction for comparison of algorithm database: A patient specific anatomical and medical image database. IRCAD, Strasbourg, France, Tech. Rep.

12. Kavur, A.E., Selver, M.A., Dicle, O., Barış, M., Gezer, N.S., 2019. CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data. URL: https://doi.org/10.5281/zenodo.3362844, doi:10.5281/zenodo.3362844

13. Jaderberg, M., Simonyan, K., Zisserman, A., kavukcuoglu, k., 2015. Spatial Transformer Networks, in: Advances in Neural Information Processing Systems, p. 2017–2025

14. Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J.V., Dalca, A.V., 2018. An unsupervised learning model for deformable medical image registration. CoRR abs/1802.02604. URL: http://arxiv.org/abs/1802.02604, doi:10.1109/CVPR.2018.00964, arXiv:1802.02604.

15. Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V., 2019. VoxelMorph: A Learning Framework for Deformable Medical Image Registration. IEEE Transactions on Medical Imaging 38, 1788–1800. doi:10.1109/tmi.2019.2897538.

Figures

Table 1: Dataset statistics. Two separate segmentation networks (three-dimensional U-Nets¹⁰) were used to segment the liver in CT and MR scans. The initial training was done with the 3D-IRCADb-01 dataset¹¹ for CT and the CHAOS dataset¹² for MR images. Finetuning was performed with 20 manually created segmentations from the M²OLIE dataset for both modalities.

Figure 1: Architecture of the multistep network variants A, B, C and the Benchmark. Subnetworks are connected in series, each receiving as input the fixed and moving images (first subnetwork) or moving images transformed by the deformation field of preceding subnetworks. The transformation is applied to the moving image by a spatial transformer¹³. The subnetworks process the images in increasingly larger resolution.

Table 2: Hyperparameter setting for the training of the multistep networks and the benchmark. Hyperparameter tuning was performed using a grid search. We selected the parameter configuration that yielded the highest Mutual Information on the validation dataset.

Table 3: The Dice coefficient, Jacobian determinant (|J|) ≤ 0 and registration time results (mean ± SD) of the baseline, benchmark and multistep networks. Dice coefficient: higher value is better (maximum is 100%), |J| ≤ 0: lower value is better (minimum is 0%).

Figure 2: Example results of the registration of an MRI to CT volume from our dataset. The images show central slices of the axial plane. The fixed images, moving images and result (moved) images are overlaid by the liver segmentations (column 1 - 3). The resulting composites for image and segmentation show the fixed data in blue and the moving/moved data in red.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

1980

DOI: https://doi.org/10.58530/2024/1980