In this study, we developed a fully automated grading system for lumbar intervertebral disc degeneration with a two-stage network. The quantitative results suggest its efficiency and it can provide valuable tools for clinical diagnosis.
Introduction
Lumbar disc degeneration (LDD) is the leading cause of lower back pain and has the risk of disability worldwide. Clinically, grading of LDD is a necessary step for making suitable treatment plan. Generally, most of the existing methods utilized hand-crafted features or shallower networks for intervertebral disc (IVD) classification, incapable of dealing with the interclass overlapping of different grading IVDs and intraclass variation. In order to achieve accurate grading, we developed a fully automated grading system, which was first constructed by a two-pathway fully convolutional network (FCN), referred as TwoPathFCN, for accurate IVD segmentation, and then appended a deep convolutional network for IVD grading.Method
In this retrospective study, the data collected from routine clinics consists of T2-weighted MRI scans of 208 patients under varied lumbar diseases, such as degeneration, herniation and scoliosis. There are 1040 individual IVDs, including their corresponding radiological Pfirrmann gradings, sourced from different scanners.
To estimate the Pfirrmann LDD grading for each lumbar IVD, we chose an image patch around each disc that covered the IVD as input of the CNN. In this study, the extraction of image patch around each disc is called IVD extraction, which is performed using the constructed TwoPathFCN. Based on the extraction results, a deep convolutional network is utilized to distinguish different grading IVDs. The flowchart of this framework is deployed in Fig.1.
TwoPathFCN for IVD extraction
Encoder-decoder is a widely used architecture in semantic segmentation, balancing local and global information. However, local information is limited even with encoder features concatenated to decoder path, e.g. U-net1, mainly due to the pooling layers. Therefore, in our study, based on U-net, we constructed an additional computational path, termed as “local pathway”, for capturing more local information, using a pure CNN - without pooling layers. We constructed the TwoPathFCN by integrating the local pathway and the backbone pathway. The full architecture along with its details is illustrated in Fig.1. In our implementation, the local pathway is a fully convolutional architecture built from dense block in DenseNet2. The backbone pathway is a typical U-net architecture1, including 19 convolutional layers, 4 pooling layers, 4 upsampling layers and 4 skip concatenations.
Convolutional network for LDD grading
To further achieve automated grading, we constructed a deep convolutional network with residual block for IVD classification based on the previous segmentation results. Fig.1 shows the architecture of the deep convolutional network. The architecture is composed of 16 residual blocks proposed by He et al3(Fig.1). Each residual block consists of two 1×1 convolutional layers, one 3×3 convolutional layers, three batch normalization layers and three ReLU layers. Besides, there are also one 7×7 convolutional layer and one 3×3 max pooling layer before these residual blocks.
For the segmentation, metrics including sensitivity (SE), specificity (SP), accuracy (AC), Intersection over Union (IoU) and Dice coefficient (DI) were selected as evaluation indicators, performed on each pixel. As for the classification, three types of configurations are performed, including the accuracy without any bias (Top-1 AC), the accuracy within ±1 bias (±1 accuracy), and the binary accuracy. The ±1 accuracy is adopted due to that our ground-truth labelling is not perfect - the intra-observer grading agreement in our database is 0.84 based on grading 208 patients twice, while agreement to ±1 is 0.98. The binary accuracy is performed on the definition that Pfirrmann grading I-III are as non-degenerated, and IV-V are as degenerated. Specifically, the binary classification is evaluated on sensitivity (SE), specificity (SP) and average precision (AP) except for accuracy (AC).
For comparison, we trained three models using the same dataset including U-net (BackbonePathFCN), pure local pathway (LocalPathFCN) and TwoPathFCN. The quantitative results of these variants are listed in Table I. As expected, the TwoPathFCN is ranked first with the highest scores on almost all metrics.
The grading performance is listed in Table II, including VGGM4, VGG165, GoogleNet6 and our deep convolutional network with 34 layers, using IVD image patch produced by TwoPathFCN network. It is obvious that the 34-layer deep convolutional network obtain a better performance than other shallower models on almost all evaluation criterions.
[1] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015, pp. 234-241: Springer.
[2] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, vol. 1, no. 2, p. 3.
[3] K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," in European Conference on Computer Vision, 2016, pp. 630-645: Springer.
[4] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, "Return of the devil in the details: Delving deep into convolutional nets," arXiv preprint arXiv:1405.3531, 2014.
[5] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[6] C. Szegedy et al., "Going deeper with convolutions," 2015: CVPR.