Alexandros Patsanis1, Mohammed R. S. Sunoqrot 1, Elise Sandsmark 2, Sverre Langørgen 2, Helena Bertilsson 3,4, Kirsten M. Selnæs 1,2, Hao Wang5, Tone F. Bathen 1,2, and Mattijs Elschot 1,2
1Department of Circulation and Medical Imaging, Norwegian University of Science and Technology - NTNU, Trondheim, Norway, 2Department of Radiology and Nuclear Medicine, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway, 3Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology - NTNU, Trondheim, Norway, 4Department of Urology, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway, 5Department of Computer Science, Norwegian University of Science and Technology - NTNU, Gjøvik, Norway
Synopsis
Generative Adversarial Networks (GANs) were evaluated for detection and
visualization of prostate cancer, proposing an automated end-to-end pipeline. Two
GANs were trained and tested with T2-weighted images from an in-house dataset
of 646 patients. The weakly-supervised GAN performed better (AUC=0.785) than unsupervised GAN (AUC=0.462). The performance of the GANs was
dependent on pre-processing parameters. The PROSTATEx dataset
(N=204) was used for external validation, giving an AUC of 0.642. The weakly-supervised GAN showed promise for
detecting and localizing prostate cancer on T2W MRI, but further research is
necessary to improve model performance and generalizability.
INTRODUCTION
Generative Adversarial Networks (GANs)
have been used successfully to detect abnormal data patterns in non-medical application
areas like credit card fraud.1 Recent work also indicates their
potential for medical imaging problems, such as image denoising,
reconstruction, segmentation, simulation, detection, and classification.2
Here, we propose and evaluate the use of GANs to detect and localize prostate
cancer (PCa) on T2-weighted (T2W) MR images.METHODS
Datasets:
We used an in-house collected dataset
(n = 646, table 1) consisting of multi-parametric MRI (mpMRI) from
men with suspected PCa obtained at St.Olavs Hospital from January 2013 to
January 2019. T2W datasets were retrieved and used for training (n = 546) and
testing (n = 105) of the GANs. In addition, we used T2W images from the PROSTATEx
challenge3 dataset (n = 204, table 1) as an external test set. For both datasets, manual segmentations of the
whole prostate and the lesions were available. Lesions with Gleason Grade Group
(GGG) ≥ 1 were considered positive for cancer.
Proposed Pipeline:
We investigated the effect of various pre-processing
steps on the performance of two different GANs. T2W images were corrected for
intensity non-uniformity using N4 bias-field correction.4 The impact
of normalizing the 3D images with an automated normalization method (AutoRef5)
was investigated. The 3D prostate volume was localized using two different automated
prostate segmentation algorithms (V-Net6, nn-U-Net7), of
which the one with the best Quality Control score (using an in-house developed
method8) was chosen for further processing. Cropped 2D images with
various sizes (64x64, 128x128, 256x256 pixels) and spatial resolutions (0.2x0.2,
0.3x0.3, 0.4x0.4, or 0.5x0.5 mm2) were used as input to the networks.
The cropped images were sampled randomly (training set), with structured strides
from the top-left to the bottom right (test set) from the prostate masks, or
with the prostate in the center of the images, depending on size and resolution
(Table 2). For patients with biopsy-proven PCa (positive cases), only cropped
images containing tumor were considered for training and testing. Figure 1 shows the
pipeline of our proposed framework.
Both unsupervised and weakly-supervised GANs
were evaluated in this work. For unsupervised GAN, f-AnoGAN9 was used to learn to generate synthetic
prostate healthy images, followed by mapping unseen images to the GAN's latent
space.9 For weakly-supervised GANs, the Fixed-Point GAN10 was
used for domain translation to healthy images, where the network
attempts to "virtually heal" cancer images. The model is trained by a
conditional identity loss that supervises the same-domain translation, using
revised adversarial, domain classification, and cycle consistency loss to
regulate cross-domain translation. The performance of the GANs for
classification of healthy versus cancer images was evaluated using the area
under the receiver operating characteristics curve (AUC), using the anomaly
score proposed by f-AnoGAN9 and the max value across all pixels
between input image proposed by Fixed-Point GAN.10RESULTS
Synthesized
images based on f-AnoGAN were visually appealing (Figure 3.a). However, the mapping was not optimal as the figure
3.b presents, where query (top) and mapped (bottom) images are not identical, leading
to low performance (highest AUC = 0.462, table 2). Figure 3.c shows detection on a positive case that fails,
as represented by the difference image (left). However, Fixed-Point GAN performed
better: Table and figure 2 present
the AUCs for all different pre-processing settings (highest AUC = 0.785).
Furthermore,
normalization improved the overall AUC, where the best AUC 0.785 was found with
128x128 pixel size and 0.5x0.5 mm pixel spacing using our proposed random sampling
technique. Figure 3.d shows an
example of a negative and positive case that was successfully assessed, where the
second column (difference) of the positive case shows the abnormal area that
was successfully detected and localized. However, external validation of the
best-performing model (0.785) on the PROSTATEx test set resulted in a
considerably lower AUC of 0.642.DISCUSSION
In this work, we propose
and evaluate a flexible and fully automated processing pipeline, including image
normalization and quality controlled prostate segmentation, compatible with different
GANs for image classification and lesion localization. To the best of our knowledge,
this is the first GAN developed explicitly for supporting PCa diagnosis with a
large-scale dataset and serves as a cornerstone for future work. We found that Fixed-point
GAN performed better than f-AnoGAN. However, when the best-performing model was
assessed in external validation, the AUC was considerably lower. This was partly
expected because no PROSTATEx data was included for training, but it shows that
the model does not yet generalize well to new from other institutes. We aim to
improve the performance and generalizability of the model by using a more diverse
image-cohort for the training. Further work will also include evaluating
the performance on the patient-level instead of the image level and to evaluate
other GANs, such as StarGAN2.11
CONCLUSION
Fixed-Point GAN shows promise for detecting and localizing prostate
cancer on T2W MRI, given the correct pre-processing steps. However, further
research is necessary to improve model performance and generalizability.Acknowledgements
We wish to express our gratitude to
the organizers of the PROSTATEx and PROSTATE12 challenges at Radboud Nijmegen University for
making their datasets available. We also want to thank Prof. Radka Stoyanova
and her team at Miller School of Medicine (Miami, FL, USA) for providing us
with the prostate delineations for the PROSTATEx dataset.References
-
Chen, J., Shen, Y., & Ali, R. (2018, November). Credit Card
Fraud Detection Using Sparse Autoencoder and Generative Adversarial Network. In
2018 IEEE 9th Annual Information Technology, Electronics and Mobile
Communication Conference (IEMCON) (pp. 1054-1059). IEEE.
- Kazeminia, S., Baur, C., Kuijper, A., van Ginneken, B., Navab, N.,
Albarqouni, S., & Mukhopadhyay, A. (2020). GANs for medical image analysis.
Artificial Intelligence in Medicine, 101938.
- Armato, S. G., Huisman, H., Drukker, K., Hadjiiski, L., Kirby, J.
S., Petrick, N., ... & Kalpathy-Cramer, J. (2018). PROSTATEx Challenges for
computerized classification of prostate lesions from multiparametric magnetic
resonance images. Journal of Medical Imaging, 5(4), 044501.
-
Tustison, N. J., Avants, B. B., Cook, P. A., Zheng, Y., Egan, A.,
Yushkevich, P. A., & Gee, J. C. (2010). N4ITK: improved N3 bias correction.
IEEE transactions on medical imaging, 29(6), 1310-1320.
-
Sunoqrot, M. R., Nketiah, G. A., Selnæs, K. M., Bathen, T. F.,
& Elschot, M. (2020). Automated reference tissue normalization of
T2-weighted MR images of the prostate using object recognition. Magnetic
Resonance Materials in Physics, Biology and Medicine, 1-13.
-
Milletari, F., Navab, N., & Ahmadi, S. A. (2016, October).
V-net: Fully convolutional neural networks for volumetric medical image
segmentation. In 2016 fourth international conference on 3D vision (3DV)
(pp. 565-571). IEEE.
-
Isensee, F., Jäger, P. F., Kohl, S. A., Petersen, J., &
Maier-Hein, K. H. (2019). Automated design of deep learning methods for
biomedical image segmentation. arXiv preprint arXiv:1904.08128.
-
Sunoqrot, M.R., Selnæs, K.M., Sandsmark, E.,
Nketiah, G.A., Zavala-Romero, O., Stoyanova, R., Bathen, T.F., Elschot, M.,
2020b. A quality control system for automated prostate segmentation on
t2-weighted mri. Diagnostics 10, 714
-
Schlegl, T., Seeböck, P., Waldstein, S. M., Langs, G., &
Schmidt-Erfurth, U. (2019). f-anogan: Fast unsupervised anomaly detection with
generative adversarial networks. Medical image analysis, 54,
30-44.
- Siddiquee, M. M. R., Zhou, Z., Tajbakhsh, N., Feng, R., Gotway, M.
B., Bengio, Y., & Liang, J. (2019). Learning fixed points in generative
adversarial networks: From image-to-image translation to disease detection and
localization. In Proceedings of the IEEE International Conference on
Computer Vision (pp. 191-200).
- Choi, Y., Uh, Y., Yoo, J., & Ha, J. W. (2020). Stargan v2:
Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (pp. 8188-8197).