2835

Deep Learning with Synthetic Diffusion-Weighted Images for Acute Ischemic Stroke Detection
CHRISTIAN FEDERAU1,2, Javier A. Montoya-Zegarra1, Soren Christensen3, Julian Maclaren3, Johanna Ospel2, Victor Schulze-Zachau2, Maarten Lansberg3, and Sebastian Kozerke1

1Institute for Biomedical Engineering, University and ETH Zurich, Zürich, Switzerland, 2Department of Radiology, University Hospital Basel, Basel, Switzerland, 3Neurology, Stanford University, Palo Alto, CA, United States

Synopsis

We studied the feasibility and accuracy of a deep learning algorithm trained on one million realistic synthetic acute stroke lesion images to detect and segment stroke lesions on clinical MR DW images. We compared this method to a more conventional approach, where a deep learning algorithm was trained on 10’000 human labelled images.

PURPOSE

Brain stroke is the second most common cause of death, and the leading cause of disability worldwide. The detection of stroke on images can be challenging and mistakes in diagnosis can lead to delays in therapy with potential harmful consequences for the patient. Deep learning algorithms might provide support for this task, but to be trained appropriately, they typically require a large number of labelled images.1 In addition, the labelling of medical images requires a high level of expertise, is tedious and error prone. We abstracted the general features of a stroke on DW images (such as size, form, location, signal and signal-to-noise ratio) obtained from a real database and produced one million realistic synthetic stroke images with known ground truth. We trained a U-Net2 on this synthetic stroke database and compared the results with the same network trained on a human-labeled database of real images.

METHODS

Labelled normalized database of stroke diffusion weighted images

Our Institutional Review Board approved this study. Cases of patients with suspicion of strokes imaged with DW-MRI were downloaded from our RIS-PACS system, anonymized, coregistered to the Montreal Neurological Institute standard space rotated in the anterior commissure–posterior commissure plane using ANTS3. Cases with other pathologies such as tumor or bleeding were excluded, while cases with chronic strokes and cases with susceptibility artefacts (for example from metallic scalp clips after craniectomy, external ventricular derivation or aneurysm clips) were kept in the database. Strokes lesions were manually segmented by a neuroradiologist and saved as a binary DICOM image (stroke vs. non-stroke). Images were cropped to the supratentorial brain and separated into a training dataset (75%) and a test dataset (25%).

Synthetic stroke DW image database

One million synthetic DW stroke images were produced from the training dataset as follows: the voxel-wise relative signal increase inside the human-outlined binary mask was extracted and thresholded to a signal increase of at least 8%. This gold standard region of interest was then fused with a normal brain DW image by multiplying voxel-wise the relative signal increase with the local normal signal of the parenchyma (Fig 1). For testing, a further 10,000 synthetic DW stroke images were generated from the test dataset following the same procedure.

U-Net training and comparison

We trained a standard U-Net(Fig 2) using a weighted cross-entropy loss function on the training dataset, as well as on the synthetic dataset. In a first step, the hyperparameters were optimized on the human-labeled training dataset by cross-validation (learning rate: 0.002; batch size: 8; stroke/background weights set to 0.02/0.98 in the weighted cross-entropy loss function). During training, we used data augmentation by randomly shifting the image horizontally and vertically by 1%, a random shear of 0.1 radians, a random zoom of 0.3, and random horizontal flipping. The resulting models were then applied to the two test datasets: the human-labeled real image test dataset and the synthetic image test dataset. The final performance of both algorithms was then compared using the DICE score, the precision and the recall.


RESULTS

1068 cases of patients with suspicion of strokes were downloaded (537 normal, 481 strokes, 50 cases excluded), resulting in a real, human-labelled training dataset of 10,016 images and a test dataset of 3200 images. Compared to the model trained on real, human-labelled images, the model trained on the synthetic dataset had a significantly higher dice coefficient when tested on the human-labelled (0.81 vs 0.65, p<0.05) and on the synthetic test datasets (0.99 vs 0.69, p<0.05) (Fig 3, 4, and 5).

DISCUSSION

Synthetic lesion generation based on combinations of real lesions and normal images can produce a large number of cases with realistic lesion patterns on a large variety of brains. The results of this study in acute strokes suggest that training a deep learning model exclusively on realistic synthetic images can improve automatic lesion segmentation. We speculate that this is both due to the quality of the labelling of the data fed to the deep learning algorithm, since the ground truth is known, and due to the large number of synthetic training images that can be generated.

Acknowledgements

Christian Federau is supported by the Swiss National Science Foundation. The Titan Xp used for this research was donated by the NVIDIA Corporation.

References

1. LeCun, Y. et al. “Deep Learning.” Nature 521, no. 7553 (May 2015): 436–44. https://doi.org/10.1038/nature14539.

2. Ronneberger O et al. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” ArXiv:1505.04597 [Cs], May 18, 2015. http://arxiv.org/abs/1505.04597.

3. http://picsl.upenn.edu/software/ants/

Figures

Fig. 1. Schematic of the production of synthetic stroke lesions. The feature extraction consists of a voxel-wise relative signal increase inside the human-outlined binary mask (from the training dataset), thresholded to a signal increase of at least 8%. The incorporation to the normal brain DW image is done by multiplying voxel-wise the relative signal increase of the feature extraction map with the local normal signal of the parenchyma.

Fig. 2. The U-net used in this research. The input image has dimensions 64x64 pixels. At each level on the network, we duplicated the number of filters during the convolution. The output is a probabilistic map of 64x64x2 pixels, where each channel contains the class-likelihood, i.e. stroke vs. non-stroke. Conv = convolution. BN = batch normalization.

Fig. 3. Boxplots of the dice coefficients of the stroke and background segmentation predictions, as tested on the dataset of real, human labeled, stroke images, and on the synthetic stroke images database. Note the significantly better performance of the network trained on the synthetic images on both the synthetic and on the real test datasets.

Fig. 4. Data are mean ±standard deviation [median].

Fig. 5. Example of stroke segmentation on a real image and a synthetic stroke image, by both models trained on the real and on the synthetic stroke databases. Top Row: Note the better delineation of the lesions segmentation of the model trained on the synthetic stroke database compared to the model trained on the real images, and even compared to the human delineation. Bottom Row: Note the quasi perfect delimitation by the model trained on the synthetic images of the synthetic lesions.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)
2835