We evaluated the influence of normalization (setting mean and standard deviation, histogram matching and percentiles) on the segmentation of rectal cancer on multimodal images when operating on multicenter data as part of a Radiomics pipeline. We used two different networks for segmentation. When training and evaluating on all data or data from a single center, normalization did not play a significant role. In contrast, when training on one center and evaluating on all others, it did play a major role. Best results are obtained by normalization using percentiles. Fixing the mean and standard deviation did not work well.
The authors gratefully acknowledge the data storage service SDS@hd supported by the Ministry of Science, Research and the Arts Baden-Württemberg (MWK) and the German Research Foundation (DFG) through grant INST 35/1314-1 FUGG and INST 35/1503-1 FUGG. This work is supported through DGF grant 428149221.
1. Fitzmaurice C, Dicker D, Pain A, et al. The Global Burden of Cancer 2013. JAMA Oncol. 2015;1(4):505. doi:10.1001/jamaoncol.2015.0735
2. Schmoll HJ, Van Cutsem E, Stein A, et al. ESMO Consensus Guidelines for management of patients with colon and rectal cancer. A personalized approach to clinical decision making. Ann Oncol. 2012;23(10):2479-2516. doi:10.1093/annonc/mds236
3. Liu Z, Zhang XY, Shi YJ, et al. Radiomics Analysis for Evaluation of Pathological Complete Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer. Clin Cancer Res. 2017;23(23):7253-7262. doi:10.1158/1078-0432.CCR-17-1038
4. Horvat N, Veeraraghavan H, Khan M, et al. MR Imaging of Rectal Cancer: Radiomics Analysis to Assess Treatment Response after Neoadjuvant Therapy. Radiology. 2018;287(3):833-843. doi:10.1148/radiol.2018172300
5. Trebeschi S, van Griethuysen JJM, Lambregts DMJ, et al. Deep Learning for Fully-Automated Localization and Segmentation of Rectal Cancer on Multiparametric MR. Sci Rep. 2017;7(1):5301. doi:10.1038/s41598-017-05728-9
6. Pal KK, Sudeep KS. Preprocessing for image classification by convolutional neural networks. In: 2016 IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT). ; 2016:1778-1781. doi:10.1109/RTEICT.2016.7808140
7. Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT Press http://www.deeplearningbook.org
8. van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B. Radiomics in medical imaging—“how-to” guide and critical reflection. Insights Imaging. 2020;11(1):91. doi:10.1186/s13244-020-00887-2
9. Shafiq-ul-Hassan M, Zhang GG, Latifi K, et al. Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels. Med Phys. 2017;44(3):1050-1062. doi:10.1002/mp.12123
10. Rödel C, Liersch T, Becker H, et al. Preoperative chemoradiotherapy and postoperative chemotherapy with fluorouracil and oxaliplatin versus fluorouracil alone in locally advanced rectal cancer: initial results of the German CAO/ARO/AIO-04 randomised phase 3 trial. Lancet Oncol. 2012;13(7):679-687. doi:10.1016/S1470-2045(12)70187-0
11. Rödel C, Graeven U, Fietkau R, et al. Oxaliplatin added to fluorouracil-based preoperative chemoradiotherapy and postoperative chemotherapy of locally advanced rectal cancer (the German CAO/ARO/AIO-04 study): final results of the multicentre, open-label, randomised, phase 3 trial. Lancet Oncol. 2015;16(8):979-989. doi:10.1016/S1470-2045(15)00159-X
12. Tustison NJ, Gee JC. N4ITK: Nick’s N3 ITK Implementation For MRI Bias Field Correction. :9.
13. Reinhold JC, Dewey BE, Carass A, Prince JL. Evaluating the impact of intensity normalization on MR image synthesis. In: Angelini ED, Landman BA, eds. Medical Imaging 2019: Image Processing. SPIE; 2019:126. doi:10.1117/12.2513089
14. Shah M, Xiao Y, Subbanna N, et al. Evaluating intensity normalization on MRIs of human brain with multiple sclerosis. Med Image Anal. 2011;15(2):267-282. doi:10.1016/j.media.2010.12.003
15. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. ArXiv150504597 Cs. Published online May 18, 2015. doi:10.1007/978-3-319-24574-4_28
16. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer Vision – ECCV 2018. Vol 11211. Lecture Notes in Computer Science. Springer International Publishing; 2018:833-851. doi:10.1007/978-3-030-01234-2_49
17. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2017:2261-2269. doi:10.1109/CVPR.2017.243
18. Soomro MH, Coppotelli M, Conforto S, et al. Automated Segmentation of Colorectal Tumor in 3D MRI Using 3D Multiscale Densely Connected Convolutional Neural Network. J Healthc Eng. 2019;2019:1-11. doi:10.1155/2019/1075434
19. Lee J, Oh JE, Kim MJ, Hur BY, Sohn DK. Reducing the Model Variance of a Rectal Cancer Segmentation Network. IEEE Access. 2019;7:182725-182733. doi:10.1109/ACCESS.2019.2960371
20. Wang J, Lu J, Qin G, et al. Technical Note: A deep learning-based autosegmentation of rectal tumors in MR images. Med Phys. 2018;45(6):2560-2564. doi:10.1002/mp.12918
21. torchvision.models — Torchvision 0.11.0 documentation. Accessed November 8, 2021. https://pytorch.org/vision/stable/models.html
Figure 1: Distribution of acquisition parameters: Even though an imaging protocol was specified for the study; the acquisition parameters vary widely. As example, the in-plane resolution (A) is supposed to be 0.8 mm, but varies between 0.27 mm and 1.64 mm. There are similar variations for the echo time (B), which was supposed to be 110 ms. In general, the data is very heterogeneous, with different parameters used within one center and greater differences between centers.
Figure 2: Segmentation Results: In (A), we trained both networks on all images and evaluated them using cross-validation. The results are very similar, with a Dice between 0.71 and 0.74. In (B), we trained on a single center and evaluated on the same center. This results in a performance reduction, probably because of the reduced number of examples. The reduction is even larger for images from different centers, visible in (C). The Dice scores differ significantly depending on the normalization method. The scores vary between 0.41 (M-Std, UNet) and 0.57 (Perc, UNet).
Figure 3: Example Segmentation: A-D are the resulting segmentations when evaluating an image from Center 1 with a network trained on images from Center 3 with the normalizations Perc (A), Perc-HM (B), HM (C) and M-Std (D). The ground truth is visible in E. We trained the networks on multiple graphics cards using the Adam optimizer with a learning rate of 0.001. For the sampling, a ratio sampler was used to reduce the class imbalance. This way, 50 % of slices are centered on a tumor voxel.
Table 1: Segmentation Results: In this table are the mean Dice scores for different networks and normalization strategies. Column “all” are the Dice scores when training and testing on all available images. Column “internal” are the Dice scores when evaluating on the same center as the training, and in column “external” are the Dice scores when evaluating on all other centers. For the external evaluation, the UNet with Perc normalization performs best and is significantly better than all other methods (p<0.001) besides Perc-HM. For all and internal, no method was significantly better.