0602

Classification of Pediatric Posterior Fossa Tumors using Convolutional Neural Network and Tabular Data
Moran Artzi1,2,3, Erez Redmard3, Oron Tzemach3, Jonathan Zeltser3, Omri Gropper4, Jonathan Roth2,5,6, Ben Shofty2,5,7, Danil A. Kozyrev5,7, Shlomi Constantini2,5,7, and Liat Ben-Sira2,8
1Sagol Brain Institute, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel, 2Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel, 3Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel, 4The Iby and Aladar Fleischman Faculty of Engineering, Tel Aviv University, Tel Aviv, Israel, 5Department of Pediatric Neurosurgery, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel, 6The Gilbert Israeli Neurofibromatosis Center, Tel Aviv University, Tel Aviv, Israel, 7The Gilbert Israeli Neurofibromatosis Center, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel, 8Division of Radiology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel

Synopsis

A fused architecture contrived of 2 neural networks, pre-trained ResNet-50 CNN and tabular based network is proposed for the classification of Posterior fossa tumors (PFT) types. The study included data for 158 MRI of healthy controls and pediatric patients with PFT. The input data were T1WI+C, FLAIR and diffusion MRI, and tabular data (subject's age). The best classification results obtained by the fused CNN + tubular data architecture and based on diffusion images, achieved cross-validation accuracy of validation=0.88±0.04, test=0.87±0.02. Overall, the proposed architecture achieved a ~16% improvement in accuracy for the test data compared to CNN method for this dataset.

Introduction

Posterior Fossa Tumors (PFT) are the most common solid tumors that develop in children 1. Patient age at diagnosis and MRI characteristics can help in differentiation of PFT types (Pilocytic Astrocytoma, Medulloblastoma and Ependymoma) 2,3. However currently, diagnosis can only be obtained based on histopathological examination. The aim of this study was to classify PFT types using deep learning architecture which integrates MRI data and patient age.

Methods

Dataset: Included 157 MRI scans (22 from healthy subjects, and 136 from patients with PFT). MRI data included post contrast T1WI+c, FLAIR and diffusion weighted (Trace) images (Figure 1).
Data Preprocessing: Included image coregistration, bias field correction, background removal and image cropping (brain region delineation), image resizing to 256X256, end extraction of the subject's age from the DICOM header.
Data annotation and splitting: Manual tagging of the central tumor slice in patients, and the slice number located in the center of the posterior fossa (in healthy participants) was performed and a total of five slices were automatically extracted (tagged 2 slices) for each case. The entire dataset was split into 85% training and validation and 15% testing datasets. The training and validation dataset was split into 80% training and 20% validation in a five-fold cross-validation manner.
Data augmentation: Included right-left flipping, rotation, zooming and random brightness changes. For the tabular data, the augmented age was sampled from a distribution similar to the distribution of each class in the original dataset.
Network training: Input data included T1W+c, FLAIR and diffusion Trace images, and normalized subject's age. Classification performance were tested based on three neural networks architectures: (1) Tabular architecture based on 4 with 1 input node for age, followed by batch normalization and 4 nodes in the hidden layer, and Cross Entropy loss function, (2) CNN architecture based on 2D ResNet50 5, and (3) Fused CNN + tubular architecture: An integrated architecture was used, which received tabular information in the input layer along with images (Figure 2). The architecture is comprised of 2 neural networks: CNN and the tabular network. The CNN is identical to ResNet50's architecture up until the linear layer, and the tabular network consists of batch-norm embedding layer followed by a 4 node linear layer.
Post processing: Was performed at the subject level and for the test data only. For a given set of 2D images with predictions, which belong to a given patient, adjacent slices were clustered and classification was determined through majority voting between the cluster slices.
Evaluation of classification results: Was performed on the validation and test datasets, and for each one of the 5-folds, using accuracy, precision, recall and F1 score. Classification results were evaluated for the following experiments: (1) Tabular architecture, (2) CNN architecture for each one of the four input datasets (T1W+c, FLAIR, diffusion and concatenated image of the three contrasts), (3) Fused CNN + tubular architecture for the best input imaging configuration. For visual explanations, Gradient-weighted Class Activation Mapping (Grad-CAM) 6 was used.

Results

Classification results
1) Tabular architecture: classification based on age only achieved poor result of mean accuracy = 0.42±0.08, however the relatively high recall score obtained for the Ependymoma (0.70±0.27) compared with the other groups (0.15-0.63), indicates its potential contribution for groups classification.
2) CNN architecture: The best classification results were obtained using diffusion (Trace) images only with overall accuracy of 0.86±0.02 for the validation and 0.75±0.04 for the test data.
3) Fused CNN + tubular architecture: The proposed architecture achieved the highest performance with improvement of 16% in accuracy compared to CNN architecture for this dataset. With overall accuracy of 0.88±0.04 for the validation and 0.87±0.02 for the test datasets. The similar performance obtained for the validation and test datasets demonstrate the high generalizability of the model. The proposed architecture also achieved substantial improvement in the Ependymoma precision score compared with the CNN only (validation= 0.90±0.15 vs. 0.52±0.19, test= 0.16±0.15 vs. 0.70±0.19).
Visual model explanation
Figure 3 presents interpretation of the ResNet50 model based on Grad-CAM. All classes are activated at the posterior fossa region, for the three patient groups, the peak activation, i.e. pixels that are important for the model's classification decision, located at the tumoral area, while for the healthy (non-tumor) cases class a stronger, and more diffuse activation is shown, centered in the posterior fossa area. This result demonstrates that while the network received a 2D cross-sectional image of the entire brain, it learned to identify the area of interest and to establish the classification model based on the pathology related features.

Conclusion

In this study we proposed the used of fused CNN and tabular data architecture for the classification of pediatric PFT based on MRI data. The proposed architecture outperformed the performance of CNN alone. With current availability of medical imaging along with patients' metadata such as lab tests and different signal recording modalities, such an approach can be utilized to personalize clinical decision making and to improve patient care.

Acknowledgements

No acknowledgement found.

References

1. K. S. V. Prasad, D. Ravi, V. Pallikonda, and B. V. S. Raman, "Clinicopathological study of pediatric posterior fossa tumors," Journal of pediatric neurosciences, vol. 12, p. 245, 2017.

2. T. A. Huisman, "Posterior Fossa tumors in children: differential diagnosis and advanced imaging techniques," The neuroradiology journal, vol. 20, pp. 449-460, 2007.

3. T. E. Merchant, I. F. Pollack, and J. S. Loeffler, "Brain tumors across the age spectrum: biology, therapy, and late effects," in Seminars in radiation oncology, 2010, pp. 58-66.

4. W. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent in nervous activity," The bulletin of mathematical biophysics, vol. 5, pp. 115-133, 1943.

5. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.

6. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-cam: Visual explanations from deep networks via gradient-based localization," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618-626.

Figures

Figure 1: Input data for the network training: a. imaging data (T1W+c, FLAIR, diffusion and concatenated image of the three contrasts). b. metadata (subjects' age)

Figure 2: Illustration of the fused CNN and tabular data architecture

Figure 3: Model interpretation based on Gradient-weighted Class Activation Mapping (Grad-CAM).

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)
0602