3553

Attention-Based Deep Kidney Segmentation Framework for GFR Prediction

Edgar Rios Piedra^1,2, Morteza Mardani^1,2, and Shreyas Vasanawala^1,2
¹Department of Radiology, Stanford University, Stanford, CA, United States, ²Department of Electrical Engineering, Stanford University, Stanford, CA, United States

Synopsis

Automated segmentation of kidneys and their sub-components is a challenging problem, particularly in pediatric patients and in the presence of anatomical deformations or pathology. We present an improved segmentation framework using a multi-channel U-Net with added attention block that allows for the automated segmentation of the multi-phase DCE-MRI of kidneys as well as a functional evaluation of the glomerular filtration rate. Results achieve an average Dice similarity coefficient of 0.912, 0.853, and 0.917 for kidney cortex, medulla, and collector system, respectively.

Introduction

Glomerular filtration rate (GFR) is the main biomarker for kidney function evaluation and is critical for clinical decision making [1]. GFR can be estimated using Dynamic Contrast-Enhanced MRI (DCE-MRI) using existing pharmacokinetic models, but they require the segmentation of the kidney and its different components. This processing is often time-intensive, and made more challenging by motion, low SNR, and anatomic variability (i.e., heterogeneity) between cases, particularly in pediatric patients, bringing variability in final results [2]. In this work, we present an improved deep-learning-based segmentation framework that enables the automated segmentation of the kidney and its sub-components [3], and GFR calculation with the added capability of processing studies with a variable number of dynamic temporal phases (time-invariance) by selecting the most relevant phases from the input test case. Results show an increase in segmentation performance (Dice coefficient) and improvement in data processing efficiency (relevant when there is data heterogeneity or redundant information).

Methods

The improved multi-channel framework (Figure 1) is composed of a UNet-based architecture (DeepKidney) [3,4] with an added soft-attention block at the beginning to determine the DCE phases that have the largest weights so they can be subsequently used in the segmentation process. The inputs are dynamic contrast-enhanced studies (e.g., 50 and 18 temporal phases) and outputs are segmentation maps for the different kidney components (cortex, medulla, collecting ducts) with an additional GFR calculation component at the end to provide the functional status of the kidneys. Imaging was performed using a multiphase 3D modified SPGR sequence with motion navigation, intermittent spectrally selective fat-inversion pulses, and VDRad sampling patterns were used during the contrast injection. Minimum echo time (TE) 1.2–1.6 msec, repetition time (TR) 3.0–3.7 msec, flip angle 15 degrees bandwidth (BW) 100 kHz, slice thickness 0.9–1.2 mm, FOV 20–44 cm, spatial resolution 0.8 x.8–1.4 x 1.4 \(mm^2\), and a total acceleration factor of 7.8–8.0.
Dataset: The input dataset was collected with IRB approval and consists of 45 high-resolution pediatric input cases \(I(x_i,y_i)\) from which 25 were used for training, 10 for validation and 10 for test. Several of these cases included abnormalities (e.g., hydronephrosis, congenital kidney anomalies, surgery) that introduce heterogeneity to our dataset. Manually delineated regions of interest (ROI) were generated for both kidneys by an expert technologist, with subsequent radiologist editing, to train and assess system performance.
Network architecture and training: The network consists of an initial attention block that allows the network to select specific parts of the input vector (reducing the amount of necessary data), which is accomplished by multiplying the feature maps with a soft mask of values ranging from zero to one [5]. This is followed by three 3x3 convolution layers on the contracting path and a ReLU and max-pooling layers at the end of each block. The upstream network has a similar configuration with a 2x2 up-convolution step and 3x3 convolution layers. The final layer is a 1x1 fully connected layer to produce pixel-wise scores for the ROIs of size \(x_i,y_i\) for an N number of slices that match the input image volumes. Subsequently, the obtained segmentation masks are used to calculate the GFR maps and scores (Patlack-Rutland model) [6]
Performance assessment: Finally, to evaluate that the attention module was able to achieve better results than more standard methods to use data efficiently (data dimensionality reduction), we evaluated the network performance when a different number of principal components (PC) are utilized. To do this, the input vector is reconstructed into a 2D input of shape \( P*(x*y*z) \), where P is the number of phases (50 in this case) and x, y, and z represent the 3D dimensions of each phase. Figure 2 shows the segmentation performance when a different number of principal components is utilized.

Results

The segmentation results were evaluated using the Dice coefficient. Table 1a shows the performance of the proposed improved framework for the different kidney components (cortex, medulla and collecting system), and Table 1b shows the comparison between the results obtained with the improved framework (network with the attention module) and the PC-based approach. Additional to the time invariance, augmented efficiency, and reduced data complexity, the performance of the proposed framework (DeepKidney+) improves on previous approaches and on results observed using the PC-based phase selection. Lastly, Figure 3 shows examples of the obtained GFR maps and a comparison between the reference result (obtained a mean similarity of 94%).

Discussion and Conclusion

We present an improved and more efficient deep learning approach to obtain accurate segmentations on heterogeneous DCE-MRI data (low SNR, anatomic variability, motion). In this project, we developed a framework that is able to accurately determine the important temporal components, segment the different kidney components and produce an estimate of functional status on pediatric cases. The results improve on other previously reported segmentation methods (no sub-component or functional evaluation) [2]. Ongoing work includes an extensive practical clinical evaluation and making the framework available for external use.

Acknowledgements

NIH R01EB009690, NIH R01EB026136, and GE Healthcare.

References

1.- Yoruk, Umit, et al. "High temporal resolution dynamic MRI and arterial input function for assessment of GFR in pediatric subjects." Magnetic resonance in medicine 75.3 (2016): 1301-1311.

2.- Khalifa, Fahmi, et al. "Models and methods for analyzing DCE‐MRI: A review." Medical physics 41.12 (2014).

3.- Rios Piedra, E., Mardani M., Nakarmi U., Cheng J., Vasanawala S. “DeepKidney: Deep segmentation of MR images for automated glomerular function quantification in heterogeneous pediatric patients” .International Society for Magnetic Resonance in Medicine (ISMRM), 2019.

4.- Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.

5.- Wang, Fei, et al. "Residual attention network for image classification." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

6.- Hackstein, Nils, Jan Heckrodt, and Wigbert S. Rau. "Measurement of single‐kidney glomerular filtration rate using a contrast‐enhanced dynamic gradient‐echo sequence and the Rutland‐Patlak plot technique." Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine 18.6 (2003): 714-725.

Figures

Figure 1: Overall multi-channel U-Net-based architecture with an added attention module. This initial block determines the relevant temporal components of the DCE-MRI to pass on to the subsequent layers and the final post-processing block cleans and removes small unrelated areas of the ROI. The output layer produces a kidney component probability map from which a binary mask is extracted. Case 1 shows the improved network, case 2 the case when using the principal components, and case 3 the reference method.

Figure 2: Performance observed example test cases as an n number of ranked principal components (temporal dynamic phases) are utilized for the segmentation. It can be observed that the maximum performance is reached between the main components 5 to 10. As such, the best performing results (bolder color) was utilized for the comparison with the improved segmentation network.

Figure 3: Example of the GFR maps calculated shown for two different cases, showing a comparison between the manual reference ROIs and the segmentation produced when using the U-Net. The predicted total GFR was calculated to be 58.671 ml/min (compared to a 61.720 ml/min, a difference of 5.1%) for left kidneys and 55.871 ml/min (compared to 51.947 ml/min, a difference of 7.5%)

Table 1: a) Average Dice similarity coefficients obtained for the left and right kidneys and their sub-components (test cases). CS = Collector system.b) Performance results for train, validation and test sets for whole kidney volume using the proposed improved framework to select the relevant temporal components versus the approach using the principal components (PC) for comparison (using the best performing set of components as shown in Figure 2.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)

3553