0831

Spherical U-Net for Infant Cortical Surface Parcellation

Fenqiang Zhao^1,2, Shunren Xia¹, Zhengwang Wu², Li Wang², Weili Lin², John H Gilmore³, Dinggang Shen², and Gang Li²

¹Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, China, ²Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States, ³Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

Synopsis

In neuroimaging studies, it is of great importance to accurately parcellate cortical surfaces (with a spherical topology) into meaningful regions. In this work, we propose a novel end-to-end deep learning method by formulating surface parcellation as a semantic segmentation task on the spherical space. To extend the convolutional neural networks (CNNs) to the spherical space, a set of corresponding operations are first developed, and then spherical CNNs are constructed accordingly. Specifically, the U-Net and SegNet architectures are transformed to parcellate infant cortical surfaces. Experiments on 90 neonates indicate the superiority of our proposed spherical U-Net in comparison with other methods.

Introduction

In neuroimaging studies, it is of great importance to accurately parcellate the convoluted cerebral cortex into anatomically and functionally meaningful regions. Conventional parcellation methods^1-3 suffer from cortical surface registration and designing of handcrafted features. To address these issues, a patch-wise classification method⁴ using the CNN architecture was proposed. However, this method has two drawbacks: 1) It treats each patch independently, thus leading to redundancy due to patch overlapping; 2) There is a trade-off between localization accuracy and spatial contextual information. Therefore, we consider cortical surface parcellation as a vertex-wise labeling task and we are thus motivated to leverage the classic U-Net⁵ architecture for infant cortical surface parcellation. To extend the conventional CNN in Euclidean space to the spherical space, we design a new more efficient convolution filter on the spherical surface than the previous rectangular patch (RePa) filter definition⁶.

Methods

We used the infant brain MRI dataset⁴ with 90 term-born neonates. All images were processed using a standard infant-specific pipeline⁷. Each vertex on the cortical surface was coded with 3 shape attributes, i.e., the mean curvature, sulcal depth, and average convexity. Each cortical surface was mapped onto a standard sphere⁸ and resampled using 10,242 vertices. Since a standard sphere for cortical representation is hierarchically generated from icosahedron by adding new vertices to the center of edges, the number of vertices on the surface is increased from 12 to 42, 162, 642, 2562, 10242, and so on. Therefore, we design a surface convolution operation as shown in Fig. 1A. The convolution filter, termed Direct Neighbor (DiNe) filter, is defined according to the angle between the vector of the center vertex to neighboring vertex and the x-axis in the tangent plane. The surface pooling operation is performed in the reverse order of the icosahedron expansion, as shown in Fig. 1B. We develop three upsampling methods (Linear-Interpolation, Max-pooling Indices and Transposed Convolution) by analogy with the conventional image upsampling methods (Fig. 2). Accordingly, we can extend the U-Net from Euclidean image domains to spherical surface domains. Our spherical U-Net architecture (Fig. 3) has an encoder path and a decoder path each with five resolution steps. Different from the standard U-Net, we replace all 3×3 convolutions with our DiNe convolution, 2×2 up-convolution with our surface transposed convolution, and 2×2 max pooling with our surface mean-pooling. As RePa convolution is very memory-intensive for a full U-Net experiment, a smaller variant U-Net18-RePa is created using RePa convolution. It only consists of three pooling and three transposed convolution layers, thus including only 18 convolution layers. The feature number is also halved at each layer. Meanwhile, a U-Net18-DiNe is created by replacing all RePa convolution with DiNe convolution in U-Net18-RePa. A baseline architecture Naive-DiNe is created with 16 DiNe convolution blocks and without any pooling and upsampling layers. Moreover, we study upsampling using Max-pooling Indices (SegNet-Basic) and Linear-Interpolation (SegNet-Inter). Both of them require no learning for upsampling and thus are created in the SegNet⁹ style.

Results and Discussion

We trained all the variants using mini-batch stochastic gradient descent with initial learning rate 0.1 and momentum 0.99. We used a self-adaption strategy for updating learning rate and the cross-entropy loss as the objective function for training. We report the means and standard deviations of Dice ratios based on a 3-fold cross-validation, as well as the number of parameters, memory storage and time for one inference on a NVIDIA Geforce GTX1060 GPU, in Table 1. As we can see, our spherical U-Net architectures consistently achieve better results than other methods, with the highest Dice ratio 88.87±0.16%. It is also obvious that RePa convolution is more time-consuming and memory-intensive, while our DiNe convolution is 7 times faster than RePa, 5 times smaller on memory storage and 3 times lighter on model size. Fig. 4 provides a visual comparison between parcellation results using different models. We can see that the results of our spherical U-Net show high consistency with the manual parcellations, without isolated noisy labels.

Conclusion

We transformed the conventional CNNs into the spherical CNNs by developing respective methods for surface convolution, pooling, and upsampling. Specifically, we developed a spherical U-Net architecture for infant cortical surface parcellation using DiNe convolution and surface transposed convolution. Comparisons with several architecture variants have validated the accuracy and speed of the proposed method. As our spherical U-Net architecture is very generic, we will extend it to other cortical surface tasks.

Acknowledgements

This work was supported in part by NIH grants (MH100217, MH108914, MH107815, MH110274, MH116225, MH117943, MH070890, MH064065 and HD053000).

References

Glasser M F, Coalson T S, Robinson E C, et al. A multi-modal parcellation of human cerebral cortex. Nature. 2016;536(7615):171-178.
Li G, Wang L, Shi F, et al. Simultaneous and consistent labeling of longitudinal dynamic developing cortical surfaces in infants. Medical image analysis. 2014;18(8):1274-1289.
Meng Y, Li G, Gao Y, et al. Automatic parcellation of cortical surfaces using random forests. ISBI. 2015:810.
Wu Z, Li G, Wang L, et al. Registration-Free infant cortical surface parcellation using deep convolutional neural networks. MICCAI. 2018:672-680.
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. MICCAI. 2015:234-241.
Seong S B, Pae C, Park H J. Geometric convolutional neural network for analyzing surface-based neuroimaging data. Frontiers in Neuroinformatics. 2018;12:42.
Li G, Wang L, Shi F, et al. Construction of 4D high-definition cortical surface atlases of infants: Methods and applications. Medical image analysis. 2015;25(1):22-36.
Fischl B, Sereno M I, Dale A M. Cortical surface-based analysis: II: inflation, flattening, and a surface-based coordinate system. Neuroimage. 1999;9(2):195-207.
Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE TPAMI. 2017;39(12):2481-2495.

Figures

Fig. 1. Up in A: Our proposed DiNe convolution. Down in A: the Rectangle Patch (RePa) convolution in Seong et al.⁶ Both convolutions transfer the input feature map with D channels to the output feature map with F channels. B: Illustration of the spherical surface pooling operation. All feature data (7×F) aggregated from direct neighbors are averaged or maximized, and then a refined feature (1×F) can be obtained. Meanwhile, the number of vertices is decreased to (N+6)/4, where N is the number of vertices on the original surface.

Fig. 2. Illustration of three surface upsampling methods. Linear-Interpolation follows the rule of icosahedron expansion, which is the exact opposite of mean-pooling operation. Max-pooling Indices uses the memorized pooling indices 2, 3, and 6 computed in the max-pooling step of the encoder to restore the 2-nd neighbor of a, 3-rd neighbor of b, and 6-th neighbor of c with a, b, and c’s value, respectively, and set other vertices as 0, at the corresponding upsampling layer. Transposed Convolution restores vertices by using DiNe filter to do transposed convolution with every vertex on the pooled surface and then summing overlap vertices.

Fig. 3. The proposed spherical U-Net architecture for infant cortical surface parcellation. Blue boxes represent feature maps. The number of features is denoted above the box, while the number of vertices is provided at the lower-left edge of the box.

Table 1. Comparison of different network architectures.

Fig. 4. Visual comparison of cortical parcellation results using different methods. (a) Manual parcellation; (b) SegNet-Basic; (c) Naive-DiNe; (d) Spherical U-Net.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)

0831