0816

Prostate Cancer Detection on T2-weighted MR images with Generative Adversarial Networks
Alexandros Patsanis1, Mohammed R. S. Sunoqrot 1, Elise Sandsmark 2, Sverre Langørgen 2, Helena Bertilsson 3,4, Kirsten M. Selnæs 1,2, Hao Wang5, Tone F. Bathen 1,2, and Mattijs Elschot 1,2
1Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway, 2Department of Radiology and Nuclear Medicine, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway, 3Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology - NTNU, Trondheim, Norway, 4Department of Urology, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway, 5Department of Computer Science, Norwegian University of Science and Technology - NTNU, Gjøvik, Norway

Synopsis

Generative Adversarial Networks (GANs) were evaluated for detection and visualization of prostate cancer, proposing an automated end-to-end pipeline. Two GANs were trained and tested with T2-weighted images from an in-house dataset of 646 patients. The weakly-supervised GAN performed better (AUC=0.785) than unsupervised GAN (AUC=0.462). The performance of the GANs was dependent on pre-processing parameters. The PROSTATEx dataset (N=204) was used for external validation, giving an AUC of 0.642. The weakly-supervised GAN showed promise for detecting and localizing prostate cancer on T2W MRI, but further research is necessary to improve model performance and generalizability.

INTRODUCTION

Generative Adversarial Networks (GANs) have been used successfully to detect abnormal data patterns in non-medical application areas like credit card fraud.1 Recent work also indicates their potential for medical imaging problems, such as image denoising, reconstruction, segmentation, simulation, detection, and classification.2 Here, we propose and evaluate the use of GANs to detect and localize prostate cancer (PCa) on T2-weighted (T2W) MR images.

METHODS

Datasets:
We used an in-house collected dataset (n = 646, table 1) consisting of multi-parametric MRI (mpMRI) from men with suspected PCa obtained at St.Olavs Hospital from January 2013 to January 2019. T2W datasets were retrieved and used for training (n = 546) and testing (n = 105) of the GANs. In addition, we used T2W images from the PROSTATEx challenge3 dataset (n = 204, table 1) as an external test set. For both datasets, manual segmentations of the whole prostate and the lesions were available. Lesions with Gleason Grade Group (GGG) ≥ 1 were considered positive for cancer.

Proposed Pipeline:
We investigated the effect of various pre-processing steps on the performance of two different GANs. T2W images were corrected for intensity non-uniformity using N4 bias-field correction.4 The impact of normalizing the 3D images with an automated normalization method (AutoRef5) was investigated. The 3D prostate volume was localized using two different automated prostate segmentation algorithms (V-Net6, nn-U-Net7), of which the one with the best Quality Control score (using an in-house developed method8) was chosen for further processing. Cropped 2D images with various sizes (64x64, 128x128, 256x256 pixels) and spatial resolutions (0.2x0.2, 0.3x0.3, 0.4x0.4, or 0.5x0.5 mm2) were used as input to the networks. The cropped images were sampled randomly (training set), with structured strides from the top-left to the bottom right (test set) from the prostate masks, or with the prostate in the center of the images, depending on size and resolution (Table 2). For patients with biopsy-proven PCa (positive cases), only cropped images containing tumor were considered for training and testing. Figure 1 shows the pipeline of our proposed framework.

Both unsupervised and weakly-supervised GANs were evaluated in this work. For unsupervised GAN, f-AnoGAN9 was used to learn to generate synthetic prostate healthy images, followed by mapping unseen images to the GAN's latent space.9 For weakly-supervised GANs, the Fixed-Point GAN10 was used for domain translation to healthy images, where the network attempts to "virtually heal" cancer images. The model is trained by a conditional identity loss that supervises the same-domain translation, using revised adversarial, domain classification, and cycle consistency loss to regulate cross-domain translation. The performance of the GANs for classification of healthy versus cancer images was evaluated using the area under the receiver operating characteristics curve (AUC), using the anomaly score proposed by f-AnoGAN9 and the max value across all pixels between input image proposed by Fixed-Point GAN.10

RESULTS

Synthesized images based on f-AnoGAN were visually appealing (Figure 3.a). However, the mapping was not optimal as the figure 3.b presents, where query (top) and mapped (bottom) images are not identical, leading to low performance (highest AUC = 0.462, table 2). Figure 3.c shows detection on a positive case that fails, as represented by the difference image (left). However, Fixed-Point GAN performed better: Table and figure 2 present the AUCs for all different pre-processing settings (highest AUC = 0.785). Furthermore, normalization improved the overall AUC, where the best AUC 0.785 was found with 128x128 pixel size and 0.5x0.5 mm pixel spacing using our proposed random sampling technique. Figure 3.d shows an example of a negative and positive case that was successfully assessed, where the second column (difference) of the positive case shows the abnormal area that was successfully detected and localized. However, external validation of the best-performing model (0.785) on the PROSTATEx test set resulted in a considerably lower AUC of 0.642.

DISCUSSION

In this work, we propose and evaluate a flexible and fully automated processing pipeline, including image normalization and quality controlled prostate segmentation, compatible with different GANs for image classification and lesion localization. To the best of our knowledge, this is the first GAN developed explicitly for supporting PCa diagnosis with a large-scale dataset and serves as a cornerstone for future work. We found that Fixed-point GAN performed better than f-AnoGAN. However, when the best-performing model was assessed in external validation, the AUC was considerably lower. This was partly expected because no PROSTATEx data was included for training, but it shows that the model does not yet generalize well to new from other institutes. We aim to improve the performance and generalizability of the model by using a more diverse image-cohort for the training. Further work will also include evaluating the performance on the patient-level instead of the image level and to evaluate other GANs, such as StarGAN2.11

CONCLUSION

Fixed-Point GAN shows promise for detecting and localizing prostate cancer on T2W MRI, given the correct pre-processing steps. However, further research is necessary to improve model performance and generalizability.

Acknowledgements

We wish to express our gratitude to the organizers of the PROSTATEx and PROSTATE12 challenges at Radboud Nijmegen University for making their datasets available. We also want to thank Prof. Radka Stoyanova and her team at Miller School of Medicine (Miami, FL, USA) for providing us with the prostate delineations for the PROSTATEx dataset.

References

  1. Chen, J., Shen, Y., & Ali, R. (2018, November). Credit Card Fraud Detection Using Sparse Autoencoder and Generative Adversarial Network. In 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) (pp. 1054-1059). IEEE.
  2. Kazeminia, S., Baur, C., Kuijper, A., van Ginneken, B., Navab, N., Albarqouni, S., & Mukhopadhyay, A. (2020). GANs for medical image analysis. Artificial Intelligence in Medicine, 101938.
  3. Armato, S. G., Huisman, H., Drukker, K., Hadjiiski, L., Kirby, J. S., Petrick, N., ... & Kalpathy-Cramer, J. (2018). PROSTATEx Challenges for computerized classification of prostate lesions from multiparametric magnetic resonance images. Journal of Medical Imaging, 5(4), 044501.
  4. Tustison, N. J., Avants, B. B., Cook, P. A., Zheng, Y., Egan, A., Yushkevich, P. A., & Gee, J. C. (2010). N4ITK: improved N3 bias correction. IEEE transactions on medical imaging, 29(6), 1310-1320.
  5. Sunoqrot, M. R., Nketiah, G. A., Selnæs, K. M., Bathen, T. F., & Elschot, M. (2020). Automated reference tissue normalization of T2-weighted MR images of the prostate using object recognition. Magnetic Resonance Materials in Physics, Biology and Medicine, 1-13.
  6. Milletari, F., Navab, N., & Ahmadi, S. A. (2016, October). V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV) (pp. 565-571). IEEE.
  7. Isensee, F., Jäger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2019). Automated design of deep learning methods for biomedical image segmentation. arXiv preprint arXiv:1904.08128.
  8. Sunoqrot, M.R., Selnæs, K.M., Sandsmark, E., Nketiah, G.A., Zavala-Romero, O., Stoyanova, R., Bathen, T.F., Elschot, M., 2020b. A quality control system for automated prostate segmentation on t2-weighted mri. Diagnostics 10, 714
  9. Schlegl, T., Seeböck, P., Waldstein, S. M., Langs, G., & Schmidt-Erfurth, U. (2019). f-anogan: Fast unsupervised anomaly detection with generative adversarial networks. Medical image analysis, 54, 30-44.
  10. Siddiquee, M. M. R., Zhou, Z., Tajbakhsh, N., Feng, R., Gotway, M. B., Bengio, Y., & Liang, J. (2019). Learning fixed points in generative adversarial networks: From image-to-image translation to disease detection and localization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 191-200).
  11. Choi, Y., Uh, Y., Yoo, J., & Ha, J. W. (2020). Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8188-8197).

Figures

Table 1: Dataset info for In-house and external online available datasets (left). Scan settings information of the in-house dataset (right). The regional ethics committee approved in-house data. For the in-house dataset, 1 patient was scanned with Prisma, 4 patients were scanned with Biograph_mM, and the rest were scanned with Skyra model. ( *105 manually segmented positive cases, including manually delineated lesions for biopsy-proven prostate cancer)

Table 2: shows the AUCs of the Fixed-Point GAN (2.a), and f-AnoGAN (2.b) trained and tested on images from the in-house collected dataset pre-processed with different settings. Within each subsection, the altered parameters and best resulting AUC are marked in bold. ( * Cropped with 128x128 and then resize to 256x256, 1 Automated normalization method, 2 Resized from 128x128 to 64x64 pixel size, 3 AUC on top was calculated with the Fixed-Point-GAN method, where the bottom one was calculated with the f-AnoGAN method )

Figure 1: The proposed end-to-end pipeline includes automated intensity normalization using AutoRef5, automated prostate segmentation using VNet6 and nnU-Net7 followed by an automated Quality Control step, and the sampling of cropped images with different techniques and settings. The cropped images were then used to train weakly-supervised (Fixed-Point GAN) and unsupervised GAN (f-AnoGAN) models.

Figure 2: The receiver operating characteristic curves and AUCs for the Fixed-Point-GAN with different pre-processing settings using the in-house dataset for training and testing. In addition, the best-performing model (Best AUC 0.785 found with the 200k iterations on normalized images, image size 128x128, Pixel Spacing 0.5x0.5 mm) was tested on the PROSTATEx dataset for external validation.

Figure 3: f-AnoGAN and Fixed-Point GAN - 3.a) f-AnoGAN: Linear latent space interpolation for random endpoints of trained f-AnoGAN shows that the model does not focus only on one part of the training dataset. 3.b) f-AnoGAN: Mapping from image space (query) back to GAN's latent space should yield resembled images. Here, the mapped images are similar but not entirely identical. 3.c) f-AnoGAN: Positive test case that fails– 3.d) Fixed-Point GAN: negative and positive tested cases, no differences for the negative case, whereas positive case found, and localized (difference).

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)
0816