Annika Liebgott1,2, Louisa Fay1, Viet Chau Vu2, Bin Yang1, and Sergios Gatidis2
1Institute of Signal Processing and System Theory, University of Stuttgart, Stuttgart, Germany, 2Department of Radiology, University Hospital of Tuebingen, Tuebingen, Germany
Synopsis
The
treatment of malignant melanoma with immunotherapy is a promising
approach to treat advanced stages of the disease. However, the
treatment can cause serious side effects and not every patient
responds to it, which means crucial time may be wasted by an
ineffective treatment. Assessment of the possible therapy response is
hence an important research issue. The research presented in this
study focuses on the investigation of the potential of medical
imaging and machine learning to solve this task. To this end, we
trained and compared different deep learning models on multi-modal
PET/MR images to differentiate non-responsive from responsive
patients.
Introduction
Malignant
melanoma has shown increasing worldwide incidence over the last
decades1. Although the prognosis is very good when caught early,
it is a very aggressive type of cancer that spreads quickly once it
has advanced beyond the skin barrier, leading to low survival rates. In recent years,
therapy with immune checkpoint inhibitors has lead to significantly
improved patient outcome. The treatment has shown the potential to slow
down, stop or completely reverse the disease's progress2.
While the positive effects are promising, there are also issues which often lead
to immunotherapy being not the first choice of treatment. For
instance, the stimulation of the immune system can inflict severe side effects. The main concern,
however, is that only part of the patients respond to the treatment
while the disease continues to progress in others, leading in the worst
case to wasting crucial time with ineffective therapy.
Hence,
a major issue in different clinical research disciplines is finding
out what differentiates responsive from non-responsive patients, as
well as trying to predict the individual therapy response potential.
Our research focuses on using PET/MR imaging combined with machine
learning (ML) to predict therapy response. In this study, we implemented a deep learning
(DL) system that has been trained to distinguish responsive from
non-responsive patients based on multi-modal images of segmented
organs with relationship to the immune system.
In
the past years, a couple of related studies have been published
proposing to use ML approaches. To the best of our knowledge, none of
these studies used a similar approach to ours. They either used other
imaging modalities3,4, combined imaging with other prior
knowledge (e.g. RNA sequencing3), or did not use radiological
images, but other clinical examinations (e.g. h&e stain5,
genetic analyses6,7 or blood tests8).Methods
Our
data set consists of PET/MR images (Figure 1) from 24 patients acquired at
three times over
the course of treatment. As
our cohort of patients is relatively small and we did not want
individual physiological traits to influence our results, we
only used the liver, spleen and spine (Figure 2). Segmentation of the organs has
been performed by trained physicians
on the MR images,
the resulting VOIs have then been transferred to the corresponding
PET images and ADC maps.
The
general structure of our DL system is
depicted in Figure 3. All network architectures we used are constructed in an encoder-decoder structure.
Figure 4 shows the investigated models.
In
some experiments, we employed transfer learning
10 (TL), a strategy to boost performance of a model (especially for small data sets), by re-using a pre-trained model to initialize the training process. The hypothesis is that a model trained to classify images will learn general features significant for arbitrary image
classification tasks, meaning a new model will only need to learn
the relationship between those features and the desired outputs. We hence re-used the encoder of a pre-trained model for medical image segmenatation (dataset: Medical Segmentation Decatholon
11) and only adapted the decoder layers to our task.
Our
experiments were conducted considering two questions:
- How useful are the chosen organs for our task?
- Do we need all three examinations?
Results
The performance of the best models in terms of resulting F1 score, sorted by organ and number of examinations used, are presented in Figure 5. Table a) shows the best results without TL, Table b) when utilizing a pre-trained model.Discussion
In early
experiments, we found that using all organs combined for training did
not work well, hence we further investigated the organs individually.
While liver and spleen could lead to F1 scores of ~0.8, our best
result for the spine was as low as 0.67. This indicates that the
information contained in this organ is not as useful to our
task and possibly led to the bad performance when
using all organs combined. Further experiments combining liver and spleen will be conducted in the
near future.
In general, F1 scores were higher if all three examinations were used, which was
expected due to the model being able to draw conclusions between the
response label and the image differences between acquisitions.
However, our best overall model with F1 score of 0.82 resulted from
using only the first examination of the spleen and employing TL. This indicates that the spleen may contain valuable
information about the responsiveness of a patient even before the
start of treatment, which needs to be further explored. Overall, TL proved to be mainly useful for models trained on the first examination only but yielded no benefit when using all three examinations.
Although our results look promising, classifier performance needs to be increased significantly. Based on our experiments, we are confident that using a larger training base could achieve this goal. Nevertheless, these findings are only to be
viewed as proof of concept and need to be validated on a larger, more
diverse data set to be able to draw more general conclusions.Conclusion
The results presented in this proof of concept study indicate, that predicting therapy response based on
radiological imaging using DL should be feasible. Further investigation could help to find a non-invasive method to early predict patients' individual therapy response potential.Acknowledgements
This research was conducted with the support of Vector Stiftung.References
1. Cancer Research UK,
https://www.cancerresearchuk.org/health-professional/cancer-statistics/statistics-by-cancer-type/melanoma-skin-cancer
2. I. Lugowska, P. Teterycz, P. and Rutkowski: Immunotherapy of
melanoma. Contemporary oncology (Poznan, Poland), 2018,
22(1A), 61–67. doi:10.5114
3.R. Sun, E. J. Limkin, M. Vakalopoulou, L. Dercle, S. Champiat, S. R.
Han et al.: A radiomics approach to assess tumour-infiltrating CD8
cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: an
imaging biomarker, retrospective multicohort study, The Lancet
Oncology, Volume 19, Issue 9, 2018, pp 1180-1191.
4. S. Trebeschi, S. G. Drago, N. J. Birkbak, I. Kurilova, A.M. Calin, A.
Delli Pizzi et al.: Predicting Response to Cancer Immunotherapy using
Non-invasive Radiomic Biomarkers, Annals of Oncology, mdz108, March
2019.
5. Z. Dawood, N. Coudray, R. H. Kim, S. Nomikou, U. Moran, J. S. Weber
et al.: Prediction of response and toxicity to immune checkpoint
inhibitor therapies (ICI) in melanoma using deep neural networks
machine learning, Journal of Clinical Oncology, 2018, pp 9529-9529
6. S. Gandhi, S. Pabla, M. Nesline, M. Pandey, M. S. Ernstoff, G. K. Dy
et al.: Algorithmic prediction of response to checkpoint inhi-bitors:
Hyperprogressors versus responders, Journal of ClinicaOncology, 2017,
pp 11565-11565.
7. C. Morrison, S. Pabla, J. M. Conroy et al.: Predicting response to
checkpoint inhibitors in melanoma beyond PD-L1 and mutational burden,
J. ImmunoTherapy of Cancer, 2018, pp 6 – 32.
8. C. Krieg, M. Nowicka, S. Guglietta, S. Schindler, F. J.
Hartmann, L. M. Weber et al.: Biomarker prediction
to anti-PD-1 immunotherapy by using high dimensional single cell
analysis, The Journal of Immunology May 1,
2018, 200 (1 Supplement) 174.26.
9. E. Castro, J. S. Cardoso and J. C. Pereira: “Elastic deformations
for data augmentation
in
breast cancer mass detection,” in 2018 IEEE EMBS International
Conference on
Biomedical
Health Informatics (BHI), 2018, pp. 230–234.
10. M. Raghu, C. Zhang, J. M. Kleinberg and S. Bengio, “Transfusion:
Understanding
transfer
learning with applications to medical imaging,” CoRR, vol.
abs/1902.07208,
2019.
[Online]. Available: http://arxiv.org/abs/1902.07208
11.
A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani, B.
van Ginneken,
A.
Kopp-Schneider, B. A. Landman, G. Litjens, B. H. Menze, O.
Ronneberger, R. M.
Summers,
P. Bilic, P. F. Christ, R. K. G. Do, M. Gollub, J. Golia-Pernicka, S.
Heckers,
W.
R. Jarnagin, M. McHugo, S. Napel, E. Vorontsov, L. Maier-Hein and
M.Cardoso, “A large annotated medical image dataset for the
development and evaluation
of
segmentation algorithms,” CoRR, vol. abs/1902.09063, 2019.
[Online]. Available:
http://arxiv.org/abs/1902.09063