We studied the feasibility and accuracy of a deep learning algorithm trained on one million realistic synthetic acute stroke lesion images to detect and segment stroke lesions on clinical MR DW images. We compared this method to a more conventional approach, where a deep learning algorithm was trained on 10’000 human labelled images.
PURPOSE
Brain stroke is the second most common cause of death, and the leading cause of disability worldwide. The detection of stroke on images can be challenging and mistakes in diagnosis can lead to delays in therapy with potential harmful consequences for the patient. Deep learning algorithms might provide support for this task, but to be trained appropriately, they typically require a large number of labelled images.1 In addition, the labelling of medical images requires a high level of expertise, is tedious and error prone. We abstracted the general features of a stroke on DW images (such as size, form, location, signal and signal-to-noise ratio) obtained from a real database and produced one million realistic synthetic stroke images with known ground truth. We trained a U-Net2 on this synthetic stroke database and compared the results with the same network trained on a human-labeled database of real images.METHODS
Labelled normalized database of stroke diffusion weighted images
Our Institutional Review Board approved this study. Cases of patients with suspicion of strokes imaged with DW-MRI were downloaded from our RIS-PACS system, anonymized, coregistered to the Montreal Neurological Institute standard space rotated in the anterior commissure–posterior commissure plane using ANTS3. Cases with other pathologies such as tumor or bleeding were excluded, while cases with chronic strokes and cases with susceptibility artefacts (for example from metallic scalp clips after craniectomy, external ventricular derivation or aneurysm clips) were kept in the database. Strokes lesions were manually segmented by a neuroradiologist and saved as a binary DICOM image (stroke vs. non-stroke). Images were cropped to the supratentorial brain and separated into a training dataset (75%) and a test dataset (25%).
Synthetic stroke DW image database
One million synthetic DW stroke images were produced from the training dataset as follows: the voxel-wise relative signal increase inside the human-outlined binary mask was extracted and thresholded to a signal increase of at least 8%. This gold standard region of interest was then fused with a normal brain DW image by multiplying voxel-wise the relative signal increase with the local normal signal of the parenchyma (Fig 1). For testing, a further 10,000 synthetic DW stroke images were generated from the test dataset following the same procedure.
U-Net training and comparison
We trained a standard U-Net(Fig 2) using a weighted cross-entropy loss function on the training dataset, as well as on the synthetic dataset. In a first step, the hyperparameters were optimized on the human-labeled training dataset by cross-validation (learning rate: 0.002; batch size: 8; stroke/background weights set to 0.02/0.98 in the weighted cross-entropy loss function). During training, we used data augmentation by randomly shifting the image horizontally and vertically by 1%, a random shear of 0.1 radians, a random zoom of 0.3, and random horizontal flipping. The resulting models were then applied to the two test datasets: the human-labeled real image test dataset and the synthetic image test dataset. The final performance of both algorithms was then compared using the DICE score, the precision and the recall.
RESULTS
1068 cases of patients with suspicion of strokes were downloaded (537 normal, 481 strokes, 50 cases excluded), resulting in a real, human-labelled training dataset of 10,016 images and a test dataset of 3200 images. Compared to the model trained on real, human-labelled images, the model trained on the synthetic dataset had a significantly higher dice coefficient when tested on the human-labelled (0.81 vs 0.65, p<0.05) and on the synthetic test datasets (0.99 vs 0.69, p<0.05) (Fig 3, 4, and 5).DISCUSSION
Synthetic lesion generation based on combinations of real lesions and normal images can produce a large number of cases with realistic lesion patterns on a large variety of brains. The results of this study in acute strokes suggest that training a deep learning model exclusively on realistic synthetic images can improve automatic lesion segmentation. We speculate that this is both due to the quality of the labelling of the data fed to the deep learning algorithm, since the ground truth is known, and due to the large number of synthetic training images that can be generated.1. LeCun, Y. et al. “Deep Learning.” Nature 521, no. 7553 (May 2015): 436–44. https://doi.org/10.1038/nature14539.
2. Ronneberger O et al. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” ArXiv:1505.04597 [Cs], May 18, 2015. http://arxiv.org/abs/1505.04597.
3. http://picsl.upenn.edu/software/ants/