4873

Generated data can boost the recognition performance for Intervertebral disc herniation

Fei Gao¹, Shui Liu², Xiaodong Zhang², Jue Zhang^1,3, and Xiaoying Wang^2,3

¹College of Engineering, Peking University, Beijing, China, ²Department of Radiology, Peking University First Hospital, Beijing, China, ³Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China

Synopsis

Although deep convolutional neural network has shown encouraging performance regarding lesion classification, it is limited due to the high requirement of data labeling. In this study, we attempted to improve the recognition performance under limited labeled data using generated data for lumbar intervertebral disc herniation classification.

Introduction

Intervertebral disc (IVD) herniation is a prevalent lumbar degenerative disease, often pressing on spinal nerves, and producing pain, which may be severe. Amongst all medical imaging modalities, MRI produces detailed, accurate pictures of bones, soft tissues, nerves and other internal body structures. MRI is preferred for IVD disease due to the clear visibility of soft tissues. Many studies have proposed to develop automated analysis system for IVD herniation based on hand-crafted features^1,2. In recent years, convolutional neural network (CNN) has been proved to be a more powerful tool for image analysis, but often only when sufficient amount of labeled data for a target anatomy is available. In this study, we proposed to boost the recognition performance combining generated realistic IVD images using a conditional generative adversarial network (CGAN) based on limited labeled data.

Method

In this retrospective study, the labeled data collected from routine clinics consists of T2-weighted MRI scans of 208 patients under varied lumbar diseases, such as degeneration, herniation and scoliosis. There are 1040 individual IVDs, including their corresponding radiological labels of herniation. The annotations of whether there was herniation were assessed by an expert spinal radiologist. The full dataset was split into three datasets as training (70%), validation (10%) and testing (20%).

Many studies demonstrated that Generative Adversarial Networks (GANs) is a new framework that can learn discriminative features from images and generate realistic samples. To generate high-quality samples and corresponding class labels, we propose to employ Auxiliary Classifier GAN (AC-GAN)³ in our study. AC-GAN follows the same concept as the original conditional GAN that GAN can be augmented by supplying both the generator and discriminator with class labels. Along with this strategy, AC-GAN not only supplies the class labels as the input of generator and discriminator but also with an auxiliary decoder that is tasked with reconstructing class labels. In the AC-GAN, class label $c$ is also as the input of generator in addition to noise $z$ . $G$ uses both to generate images $X_{fake} = G(c, z)$ . So every generated sample has a corresponding class label, $c \sim p_{c}$ , which is essential that the generated samples can be used for classification. The discriminator gives both a probability distribution over sources and a probability distribution over the class labels, $P(S | X), P(C | X) = D(X)$ . The objective function has two parts: the loglikelihood of the correct source, $L_{S}$ , and the log-likelihood of the correct class, $L_{C}$ .

$L_{S}=E[logP(S=real|X_{real})]+E[logP(S=fake|X_{fake})]$

$L_{C}=E[logP(C=c|X_{real})]+E[logP(C=c|X_{fake})]$

$D$ is trained to maximize $L_{S}+L_{C}$ while $G$ is trained to maximize $L_{S}-L_{C}$ . AC-GANs learn a representation for $z$ that is independent of class label. This modification to the standard GAN formulation produces excellent results and appears to stabilize training.

In the experiments, we adopt a generator comprised of three residual connection blocks with deconvolutional layers and batch normalization and a tanh activation function applied to the last layer. The dimension of noise $z$ is 128. The discriminator includes three residual connection blocks with ReLU and a fully connected layer with two linear outputs, the labels of true or fake and the labels of herniation or not. The AC-GAN architecture is shown in Figure 1. This model was trained with the 70% labeled IVD data.

After training the AC-GAN, 2000 normal IVD images and 2000 herniated IVD images were generated randomly by the trained model. To investigate if the generated samples could benefit for performance improvement, we performed three classification experiments with a classic residual network⁴ with 18 layers. This network was trained first using real data, and afterward using only generated data and finally using real data and generated data together for performance comparison.

Results&Discussion

For visual evaluation, in Fig.2 we show some of our generated IVD images and real images. The global consistency of the image is correct. Also, the IVD, vertebrae body, spinal canal and nerve appear correctly located. More importantly, our model also learned to generate IVD with different degenerated degrees. Likewise, the herniation characteristics was captured correctly.

The resulting ROC curves, built from varying the decision threshold, are displayed in Fig. 3. The models trained with real images obtained an accuracy of 0.872, while when using only generated images, the accuracy was 0.831. Encouragingly, the accuracy was improved to 0.901 when using the real images and generated images together. As far as we know, we generate realistic IVD samples using a GAN architecture and boost the classification model performance for the first time.

Conclusion

In conclusion, the generation based strategy could improve the model capability effectively in IVD herniation classification task. It is a promising strategy to reduce annotation amount and be applied to other tasks with limited training data.

Acknowledgements

No acknowledgement found.

References

[1] S. Ghosh, R. S. Alomari, V. Chaudhary, and G. Dhillon, "Computer-Aided Diagnosis for Lumbar Mri Using Heterogeneous Classifiers," (in English), 2011 8th Ieee International Symposium on Biomedical Imaging: From Nano to Macro, pp. 1179-1182, 2011.

[2] S. Ghosh, R. S. Alomari, V. Chaudhary, and G. Dhillon, "Composite Features for Automatic Diagnosis of Intervertebral Disc Herniation from Lumbar MRI," (in English), 2011 Annual International Conference of the Ieee Engineering in Medicine and Biology Society (Embc), pp. 5068-5071, 2011.

[3] A. Odena, C. Olah, and J. Shlens, "Conditional image synthesis with auxiliary classifier gans," arXiv preprint arXiv:1610.09585, 2016.

[4] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.

Figures

Fig.1 The AC-GAN architecture used for generating IVD samples. The generator consists of 4 residual blocks and generates a 64×128 image sample. The discriminator, on the other hand, takes the generated image sample and outputs two probability score, which denotes whether the given images are real or generated and the class labels of herniated or not.

Fig.2 Real and generated samples of IVD using AC-GAN. The global consistency of the image is correct. The IVD, vertebrae body, spinal canal and nerve appear correctly located. More importantly, the model also learns to generate IVD with different degenerated degrees. Likewise, the herniation characteristics are captured correctly.

Fig.3 ROC curves for models trained using only generated images, only real images, and the real images and generated images together. The results show that the generated data can boost the model performance effectively (AUC improved to 0.968 from 0.939).

Tabel I. Comparison of classification performance using real and generated data

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)

4873