In neuroimaging studies, it is of great importance to accurately parcellate cortical surfaces (with a spherical topology) into meaningful regions. In this work, we propose a novel end-to-end deep learning method by formulating surface parcellation as a semantic segmentation task on the spherical space. To extend the convolutional neural networks (
Introduction
In neuroimaging studies, it is of great importance to accurately parcellate the convoluted cerebral cortex into anatomically and functionally meaningful regions. Conventional parcellation methods1-3 suffer from cortical surface registration and designing of handcrafted features. To address these issues, a patch-wise classification method4 using the CNN architecture was proposed. However, this method has two drawbacks: 1) It treats each patch independently, thus leading to redundancy due to patch overlapping; 2) There is a trade-off between localization accuracy and spatial contextual information. Therefore, we consider cortical surface parcellation as a vertex-wise labeling task and we are thus motivated to leverage the classic U-Net5 architecture for infant cortical surface parcellation. To extend the conventional CNN in Euclidean space to the spherical space, we design a new more efficient convolution filter on the spherical surface than the previous rectangular patch (RePa) filter definition6.Methods
We used the infant brain MRI dataset4 with 90 term-born neonates. All images were processed using a standard infant-specific pipeline7. Each vertex on the cortical surface was coded with 3 shape attributes, i.e., the mean curvature, sulcal depth, and average convexity. Each cortical surface was mapped onto a standard sphere8 and resampled using 10,242 vertices. Since a standard sphere for cortical representation is hierarchically generated from icosahedron by adding new vertices to the center of edges, the number of vertices on the surface is increased from 12 to 42, 162, 642, 2562, 10242, and so on. Therefore, we design a surface convolution operation as shown in Fig. 1A. The convolution filter, termed Direct Neighbor (DiNe) filter, is defined according to the angle between the vector of the center vertex to neighboring vertex and the x-axis in the tangent plane. The surface pooling operation is performed in the reverse order of the icosahedron expansion, as shown in Fig. 1B. We develop three upsampling methods (Linear-Interpolation, Max-pooling Indices and Transposed Convolution) by analogy with the conventional image upsampling methods (Fig. 2). Accordingly, we can extend the U-Net from Euclidean image domains to spherical surface domains. Our spherical U-Net architecture (Fig. 3) has an encoder path and a decoder path each with five resolution steps. Different from the standard U-Net, we replace all 3×3 convolutions with our DiNe convolution, 2×2 up-convolution with our surface transposed convolution, and 2×2 max pooling with our surface mean-pooling. As RePa convolution is very memory-intensive for a full U-Net experiment, a smaller variant U-Net18-RePa is created using RePa convolution. It only consists of three pooling and three transposed convolution layers, thus including only 18 convolution layers. The feature number is also halved at each layer. Meanwhile, a U-Net18-DiNe is created by replacing all RePa convolution with DiNe convolution in U-Net18-RePa. A baseline architecture Naive-DiNe is created with 16 DiNe convolution blocks and without any pooling and upsampling layers. Moreover, we study upsampling using Max-pooling Indices (SegNet-Basic) and Linear-Interpolation (SegNet-Inter). Both of them require no learning for upsampling and thus are created in the SegNet9 style.Results and Discussion
We trained all the variants using mini-batch stochastic gradient descent with initial learning rate 0.1 and momentum 0.99. We used a self-adaption strategy for updating learning rate and the cross-entropy loss as the objective function for training. We report the means and standard deviations of Dice ratios based on a 3-fold cross-validation, as well as the number of parameters, memory storage and time for one inference on a NVIDIA Geforce GTX1060 GPU, in Table 1. As we can see, our spherical U-Net architectures consistently achieve better results than other methods, with the highest Dice ratio 88.87±0.16%. It is also obvious that RePa convolution is more time-consuming and memory-intensive, while our DiNe convolution is 7 times faster than RePa, 5 times smaller on memory storage and 3 times lighter on model size. Fig. 4 provides a visual comparison between parcellation results using different models. We can see that the results of our spherical U-Net show high consistency with the manual parcellations, without isolated noisy labels.Conclusion
We transformed the conventional CNNs into the spherical CNNs by developing respective methods for surface convolution, pooling, and upsampling. Specifically, we developed a spherical U-Net architecture for infant cortical surface parcellation using DiNe convolution and surface transposed convolution. Comparisons with several architecture variants have validated the accuracy and speed of the proposed method. As our spherical U-Net architecture is very generic, we will extend it to other cortical surface tasks.