3133

Multi-stage ensemble machine learning for predicting the pathology of thyroid micronodules on small-datasets high b-value thyroid DWI

ChengLong Deng^1,2, BingChao Wu^1,2, QingJun Wang³, QingLei Shi⁴, Bei Guan^1,2, Dacheng Qu⁵, ChenXi Li^1,2, DaoGuang Zan^1,2, XiaoLin Chen^1,2, and YongJi Wang^1,2
¹Collaborative Innovation Center, Institute of Software Chinese Academy of Sciences, Beijing, China, ²University of Chinese Academy of Sciences, Beijing, China, ³Department of Radiology, The 6th Medical Center of Chinese PLA General Hospital, Beijing, China, ⁴MR Scientific Marketing, Siemens Healthcare, Beijing, China, ⁵School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China

Synopsis

In this paper, a multi-stage ensemble learning based on the majority voting mechanism was designed to leverage the contradiction between an insufficient number of thyroid MRI and well-trained deep learning models that accurately predicted the pathology of thyroid micronodules. And its clinical applicability value was also assessed in terms of micronodule risk stratification and optimal regimen selection on high b-value (2000 s/mm²) diffusion-weighted images. Experimental results proved that our model had the capability of effectively distinguishing benign and malignant micronodules on small-dataset thyroid DWI images.

Background and Purpose

Hashimoto's thyroiditis (HT) and papillary thyroid microcarcinoma (PTMC) are recognized as the most difficult types in clinical diagnosis. The differential diagnosis of these two diseases is usually via ultrasound, however, ultrasonography is susceptible to echo disturbances and speckle noises, and other factors, e.g., nodules' size and acoustic characteristics, also result in high misdiagnosis rates for diagnosing the risk of HT and PTMC based on ultrasound images¹. Meanwhile, though high b-value (2000 s/mm²) DWI has proven to possess a highly-discriminative capacity in diagnosing thyroid micronodules^2,3, the complexity of high b-value diffusion-weighted imaging brings a heavy workload for clinicians, leading to the thyroid DWI dataset is small. So, we design a multi-stage ensemble learning model to release the conflicts that a small training cohort makes the classification effect and generalization ability of the model poor easily and to improve the prediction accuracy and clinical applicability.

Materials and Methods

Over 4300 thyroid DWI images with HT and PTMC were collected and confirmed by pathology. And the lesions and labels were verified by experts. Based on the pathology of thyroid micronodules, all DWI images were randomly divided into the training cohort and test cohort at a ratio of 0.75: 0.25. And no significant differences were found between the training cohort and test cohort in terms of age, gender, micronodule location, and so forth. All DWI samples were obtained using the same device with a 3.0T scanner (MAGNETOM Skyra, Siemens Healthcare, Erlangen, Germany) equipped with a dedicated 8-channel bilateral surface coil for the neck (Chenguang Medical Technologies CO., LTD, Shanghai, China) and a 4-channel soft surface coil for thoracic entrance (Siemens Healthcare Sector, Erlangen, Germany). The high b-value DWI was acquired with multi-shot readout segmentation of long variable echo-trains (RESOLVE)³ (Table 1). The aim of multi-stage ensemble learning model is to extract more category-sensitive features from the small dataset to further improve the classification accuracy. The whole model utilizes the corresponding process to get three kinds of classification results from thyroid DWI and ADC map datasets, then a final assessment is obtained by voting on them (Figure 2). Specifically, first, the thyroid micronodule DWI and classification labels were utilized for training the gcForest⁴ to get the preliminary classification results. The architecture of gcForest was optimized to meet the morphological characteristics of micronodules, including the sliding window size was set to 3×3 and 5×5 for scanning in the stage of Multi-Grained Scanning. Second, radiomics features were extracted from thyroid micronodule DWI and ADC maps respectively to train the Xgboost for another two kinds of assessment results. The extracted radiomics features included first-order statistics, gray level co-occurrence matrix, gray level size zone matrix, gray level run length matrix, neighbouring gray tone difference matrix, and gray level dependence matrix. Third, three groups of results were integrated by the majority voting mechanism to get a final diagnosis. Accuracy (Ac), sensitivity (Se), and specificity (Sp) were treated as indicators for quantitative evaluation of classification accuracy.

Results

In this experiment, the comparison results have proven that our ensemble model based on DWI is superior to other advanced methods in terms of all metrics (Table 2). Meanwhile, the advantage of voting on three groups of assessments to improve model performance can also be reflected. The accuracy, sensitivity, and specificity of classification are 0.887, 0.899, 0.887 in the test cohort, respectively.

Discussions

It is a valuable application to automatically diagnose the pathology of thyroid micronodules for preoperative risk stratification, diagnosis, and treatment in the clinic. However, obscured by bone and air, sonography is difficult to reflect its value in deep anatomical details of the neck. Although DWI can provide information about tumor pathological features and functional features, the complexity of MR diffusion images and long imaging time not only cause a heavy workload to clinicians' diagnosis but also limits the application of machine learning model. This means that, if only insufficient DWI samples are utilized to train the model, overfitting will be caused easily, and the generalization ability of the model will be poor. To prevent the above problems, we proposed a multi-stage ensemble learning model to predict the risk of micronodules on the limited thyroid DWI datasets. A data-driven classification model that does not rely on sufficient cases is utilized to get a preliminary results. And radiomics features that expresses intra-tumor heterogeneity are extracted to input Xgboost algorithm for another two groups of results. Afterward, the final diagnosis is obtained by voting them. Through verification, our model could achieve the state-of-the-art performance.

Conclusions

The study indicated that the multi-stage ensemble learning could compensates the negative effect of small samples and implements a promising prospect to diagnose benign and malignant thyroid micronodules.

Acknowledgements

This research was funded by National Key R&D Program of China from Ministry of Science and Technology (2017YFB1002303).

References

1. Haugen B R, Alexander E K, Bible K C, et al. 2015 American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: the American Thyroid Association guidelines task force on thyroid nodules and differentiated thyroid cancer. Thyroid. 2016;26(1):1-133.

2. Wang Q, Guo Y, Zhang J, et al. Diagnostic value of high b-value (2000 s/mm2) DWI for thyroid micronodules. Medicine. 2019;98(10):e14298.

3. Schob S, Voigt P, Bure L, et al. Diffusion-weighted imaging using a readout-segmented, multishot EPI sequence at 3 T distinguishes between morphologically differentiated and undifferentiated subtypes of thyroid carcinoma—a preliminary study. Translational oncology. 2016;9(5):403-410.

4. Zhou Z H and Feng J. Deepforest. arXivpreprintarXiv:1702.08835, 2017.

Figures

Figure 1: Samples of thyroid micronodules. (A) Hashimoto's thyroiditis and (B) papillary thyroid microcarcinoma (arrow-heads) in the right lobe (zoom in twice) on DW-MRIs and ADC maps with b=2000 s/mm².

Figure 2: Schematic diagram of an ensemble learning model for automated micronodule classification.

Table 1: Imaging protocol parameters of T1WI, T2WI and high b-value DWI.

Table 2: Ac = Accuracy, Se = Sensitivity, Sp = Specificity, Radiomics-based = radiomics features of RoIs were fused with deep features captured by fine-tuned ResNet-50 network.

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)

3133

DOI: https://doi.org/10.58530/2022/3133