Although deep convolutional neural network has shown encouraging performance regarding lesion classification, it is limited due to the high requirement of data labeling. In this study, we attempted to improve the recognition performance under limited labeled data using generated data for lumbar intervertebral disc herniation classification.
In this retrospective study, the labeled data collected from routine clinics consists of T2-weighted MRI scans of 208 patients under varied lumbar diseases, such as degeneration, herniation and scoliosis. There are 1040 individual IVDs, including their corresponding radiological labels of herniation. The annotations of whether there was herniation were assessed by an expert spinal radiologist. The full dataset was split into three datasets as training (70%), validation (10%) and testing (20%).
Many studies demonstrated that Generative Adversarial Networks (GANs) is a new framework that can learn discriminative features from images and generate realistic samples. To generate high-quality samples and corresponding class labels, we propose to employ Auxiliary Classifier GAN (AC-GAN)3 in our study. AC-GAN follows the same concept as the original conditional GAN that GAN can be augmented by supplying both the generator and discriminator with class labels. Along with this strategy, AC-GAN not only supplies the class labels as the input of generator and discriminator but also with an auxiliary decoder that is tasked with reconstructing class labels. In the AC-GAN, class label $$$c$$$ is also as the input of generator in addition to noise $$$z$$$. $$$G$$$ uses both to generate images $$$X_{fake} = G(c, z)$$$. So every generated sample has a corresponding class label, $$$c \sim p_{c}$$$, which is essential that the generated samples can be used for classification. The discriminator gives both a probability distribution over sources and a probability distribution over the class labels, $$$P(S | X), P(C | X) = D(X)$$$. The objective function has two parts: the loglikelihood of the correct source, $$$L_{S}$$$, and the log-likelihood of the correct class, $$$L_{C}$$$.
$$L_{S}=E[logP(S=real|X_{real})]+E[logP(S=fake|X_{fake})]$$
$$L_{C}=E[logP(C=c|X_{real})]+E[logP(C=c|X_{fake})]$$
$$$D$$$ is trained to maximize $$$L_{S}+L_{C}$$$ while $$$G$$$ is trained to maximize $$$L_{S}-L_{C}$$$. AC-GANs learn a representation for $$$z$$$ that is independent of class label. This modification to the standard GAN formulation produces excellent results and appears to stabilize training.
In the experiments, we adopt a generator comprised of three residual connection blocks with deconvolutional layers and batch normalization and a tanh activation function applied to the last layer. The dimension of noise $$$z$$$ is 128. The discriminator includes three residual connection blocks with ReLU and a fully connected layer with two linear outputs, the labels of true or fake and the labels of herniation or not. The AC-GAN architecture is shown in Figure 1. This model was trained with the 70% labeled IVD data.
After training the AC-GAN, 2000 normal IVD images and 2000 herniated IVD images were generated randomly by the trained model. To investigate if the generated samples could benefit for performance improvement, we performed three classification experiments with a classic residual network4 with 18 layers. This network was trained first using real data, and afterward using only generated data and finally using real data and generated data together for performance comparison.
For visual evaluation, in Fig.2 we show some of our generated IVD images and real images. The global consistency of the image is correct. Also, the IVD, vertebrae body, spinal canal and nerve appear correctly located. More importantly, our model also learned to generate IVD with different degenerated degrees. Likewise, the herniation characteristics was captured correctly.
The resulting ROC curves, built from varying the decision threshold, are displayed in Fig. 3. The models trained with real images obtained an accuracy of 0.872, while when using only generated images, the accuracy was 0.831. Encouragingly, the accuracy was improved to 0.901 when using the real images and generated images together. As far as we know, we generate realistic IVD samples using a GAN architecture and boost the classification model performance for the first time.
[1] S. Ghosh, R. S. Alomari, V. Chaudhary, and G. Dhillon, "Computer-Aided Diagnosis for Lumbar Mri Using Heterogeneous Classifiers," (in English), 2011 8th Ieee International Symposium on Biomedical Imaging: From Nano to Macro, pp. 1179-1182, 2011.
[2] S. Ghosh, R. S. Alomari, V. Chaudhary, and G. Dhillon, "Composite Features for Automatic Diagnosis of Intervertebral Disc Herniation from Lumbar MRI," (in English), 2011 Annual International Conference of the Ieee Engineering in Medicine and Biology Society (Embc), pp. 5068-5071, 2011.
[3] A. Odena, C. Olah, and J. Shlens, "Conditional image synthesis with auxiliary classifier gans," arXiv preprint arXiv:1610.09585, 2016.
[4] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.