We developed a two-pathway fully convolutional network for refined intervertebral disc segmentation. The proposed pooling free subbranch can capture more local fine-grained features. The quantitative results indicate its priority for disc segmentation.
In this retrospective study, the data collected from routine clinics consists of T2-weighted MRI scans of 208 patients under varied lumbar diseases, such as degeneration, herniation and scoliosis, sourced from different scanners. The dataset was divided into training (70%), validation (10%) and test set (20%) randomly.
Encoder-decoder is a widely used architecture in semantic segmentation, balancing local and global information. However, local information is limited even with encoder features concatenated to decoder path, e.g. U-net2, mainly due to the pooling layers. Therefore, in our study, based on U-net, we constructed an additional computational path, termed as “local pathway”, for capturing more local information, using a pure CNN - without pooling layers. We constructed the TwoPathFCN by integrating the local pathway and the backbone pathway. The full architecture along with its details is illustrated in Fig.1. In our implementation, the local pathway is a fully convolutional architecture built from dense block in DenseNet3. Rather than adding more down-sampling layers (by max-pooling or strided convolution) at the cost of missing the low-level spatial information, we use dilated convolutions4 to increase the receptive fields of the local pathway. The backbone pathway is a typical U-net architecture2, including 19 convolutional layers, four pooling layers, four upsampling layers and four skip concatenations. To allow for the combination of the hidden layers of both pathways, the feature maps of respective last layers of both pathways are concatenated together and then fed to the output layer.
Considering the imbalance between foreground and background pixels, we proposed to integrate weighted pixel-wise cross entropy in the proposed TwoPathFCN. Specifically, $$$w_{foreground}=0.9$$$ and $$$w_{background}=0.1$$$ are adopted during training to bias the model to pay more attention to the foreground pixels.
For the segmentation, metrics including sensitivity (SE), specificity (SP), accuracy (AC), Intersection over Union (IoU) and Dice coefficient (DI) were selected as evaluation indicators, performed on each pixel.
For comparison, we trained three models using the same dataset including U-net (BackbonePathFCN), pure local pathway (LocalPathFCN) and TwoPathFCN. The quantitative results of these variants are listed in Table I. As expected, the TwoPathFCN is ranked first with the highest scores on almost all metrics. The segmentation results on four typical subjects from our test set, produced by different variant architectures, are illustrated in Fig.2. The yellow arrows indicate that error-prone IVDs can still be effectively segmented by the proposed TwoPathFCN method. By harnessing two pathways, the global high-level features as well as the local fine-grained features could be captured simultaneously. This leads to that the dual-pathway architecture can produce better segmentation results compared with the single-pathway architecture.
[1] C. W. A. Pfirrmann, A. Metzdorf, M. Zanetti, J. Hodler, and N. Boos, "Magnetic resonance classification of lumbar intervertebral disc degeneration," (in English), Spine, vol. 26, no. 17, pp. 1873-1878, Sep 1 2001.
[2] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015, pp. 234-241: Springer.
[3] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, vol. 1, no. 2, p. 3.
[4] F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122, 2015.