3814

Blindly trusting MRI radiomics? A study on radiomic features repeatability and reproducibility with a dedicated phantom
Linda Bianchini1, João Santinha2, Nuno Loução3, Mário Figueiredo4, Francesca Botta5, Daniela Origgi5, Marta Cremonesi5, Nickolas Papanikolaou2, and Alessandro Lascialfari1
1Università degli Studi di Milano, Milan, Italy, 2Champalimaud Center for the Unknown, Lisbon, Portugal, 3Philips Healthcare, Lisbon, Portugal, 4Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal, 5Istituto Europeo di Oncologia (IEO) IRCCS, Milan, Italy

Synopsis

The radiomic features stability when MR scanners of different vendors or magnetic fields are involved is not known and is needed to support clinical studies. A study on the radiomic features repeatability and reproducibility was carried out with a pelvis phantom designed for radiomic purposes on three MR scanners, two of same field but different vendors and two of same vendor but different fields. The crucial results include a consistent percentage loss of features repeatability after phantom repositioning, suggesting the need for a features selection in studies involving patient’s repositioning. The limited reproducibility demands attention when dealing with multicentric studies.

Introduction

Radiomic studies on MR (Magnetic Resonance) images of patients with a pelvic tumour showed promising results.1,2 However, given that the published studies often include non-homogeneous data sets and the stability of radiomic features in different settings is not known, a comprehensive methodological study on the repeatability and reproducibility of the features is needed. The investigation was carried out at the European Institute of Oncology (IEO, Milan) and at the Champalimaud Center for the Unknown (CCU, Lisbon).

Methods

A pelvis home-made phantom designed for radiomic purposes was used for this study. The phantom, called PETER PHAN (PElvis TExtuRe PHANtom), is made up of a pelvis-shaped container filled with a solution of MnCl2 and with four inserts embedded (Fig. 1a). The inserts were created by mixing polystyrene spheres of different diameters and agar gel to mimic the texture of a typical pelvis tumour. The solution in the main phantom compartment reproduces the relaxation time T2 of the muscle tissue surrounding the tumour in patients.
T2-w images of PETER PHAN were acquired on a 1.5 T GE scanner (A), a 1.5 T Philips scanner (B) and a 3 T Philips scanner (C), using a typical sequence for pelvis diagnostic imaging and setting the same parameters on all the scanners. For each scanner, the acquisition was repeated twice without changing the setup. After phantom repositioning, the acquisition was repeated identically. Sixteen regions of interest (ROI) of four sizes were drawn on the phantom inserts (Fig. 1b) using 3D Slicer (ver 4.10.1) and considering three consecutive slices. PyRadiomics software 3 ver 2.2.0 was exploited to normalise the images and to extract radiomic features in 2D (included categories: Shape, First Order, GLCM, GLRLM, GLSZM, NGTDM and GLDM) from each ROI, on both original and filtered images (Laplacian of Gaussian, Wavelet, Square, Square Root, Logarithm and Exponential).
The interclass correlation coefficient 4 (ICC) for absolute agreement was calculated pairwise (acquisition 1 and acquisition 2 on the same scanner) for each radiomic feature to test repeatability, with and without phantom repositioning. To test reproducibility on scanners of different vendor, the ICC was evaluated between acquisitions on scanner A and B, both for absolute agreement and consistency. A similar procedure was used to test the reproducibility on scanners of different field (B vs C).The most stable features were identified intersecting the set of features showing both high repeatability and high reproducibility. Given the low repeatability values after phantom repositioning, two 3D acquisitions (T2-w images) were performed on scanner B (acquisition 1 followed by acquisition 2, after phantom repositioning) allowing the features extraction in 3D (in contrast to the previous experiments) and the investigation of the rotational (in)variance of the radiomic features.

Results

A total of 944 features were extracted for the repeatability and reproducibility experiment. The features were classified into four categories 4: poor performance (ICC < 0.5), moderate performance (0.5 ICC 0.75), good performance (0.75 ICC 0.9) and excellent performance (ICC > 0.9). The results are reported in Fig. 2 and Fig. 3.
The features with excellent repeatability and reproducibility were 7% of the total for the scanner A vs scanner B experiment and 17% for the scanner B vs scanner C experiment. The stable features are shown in Fig. 4.
Regarding the rotational invariance experiment, out of 1316 features, 35% showed excellent, 28% a good, 25% a moderate and 12% a poor rotational invariance. A study on the distribution of the features showing excellent rotational invariance is summarized in Fig.5.

Discussion

In regards to the repeatability evaluation (without phantom repositioning), the majority of the features showed excellent repeatability on both 1.5 T scanners (97% for scanner A and 92% for scanner B). Our experiment suggests that the repeatability decreases with the magnetic field (80% for scanner C). The acquisitions performed on all the scanners showed that the repeatability drastically decreases after the repositioning of the phantom. It was observed that the transition from a 2D to a 3D features extraction did not improve the features repeatability.Concerning reproducibility, the number of features showing excellent reproducibility is limited, even being less conservative and considering the consistency evaluation (instead of the absolute agreement). The reproducibility is higher comparing the features extracted from two scanners of different field with respect to the reproducibility obtained comparing two scanners of same field and different vendor (Fig. 3). Performing an intersection between the features with both excellent repeatability and reproducibility, the most stable features belong to the First Order class and they were extracted from Wavelet filtered images (Fig. 4).

Conclusions

The radiomic features extracted from MR images of the pelvis phantom showed excellent repeatability when the experiment was performed with no variations. However, this percentage drops when considering the phantom repositioning, suggesting that a repeatability investigation must be performed to support clinical radiomic studies (e.g. follow-up studies including patient repositioning). Our results also show that a selection of the reproducible features should be carried out in the case of multicentric studies, given the low features reproducibility observed comparing scanners of different vendors or fields.

Acknowledgements

No acknowledgement found.

References

1. Li, Z. et al. “MR-Based Radiomics Nomogram of Cervical Cancer in Prediction of the Lymph-Vascular Space Invasion preoperatively.” J. Magn. Reson. Imaging (2018): 1-7.

2. X. Zhou et al., “Radiomics-Based Pretherapeutic Prediction of Non-response to Neoadjuvant Therapy in Locally Advanced Rectal Cancer,” Ann. Surg. Oncol., 2019.

3. Van Griethuysen, Joost JM, et al. “Computational radiomics system to decode the radiographic phenotype.” Cancer research 77.21 (2017): e104-e107.

4. Koo, Terry K., and Mae Y. Li. “A guideline of selecting and reporting intraclass correlation coefficients for reliability research.” Journal of chiropractic medicine 15.2 (2016): 155-163.

Figures

Figure 1. PETER PHAN. (a) Frontal view. (b) Axial T2-weighted image with selected ROI (yellow d = 12 mm; blue d = 24 mm; green d = 36 mm, red d = 48 mm).

Figure 2. Repeatability of radiomic features. The results are reported for repeatability without phantom repositioning on scanner A, B and C and with phantom repositioning on scanner A (A-R), B (B-R) and C (C-R).

Figure 3. Reproducibility of radiomic features. The repduducibility was evaluated between scanner A and B (same field, different vendor) and B and C (same vendor, different field) in terms of ICC and both for agreement (A vs B - agr/ B vs C - agr) and consistency (A vs B - con/ B vs C - con).

Figure 4. Radiomic features showing excellent stability. Features with excellent (ICC > 0.9) repeatability and reproducibility by class (a) and image type (b).

Figure 5. Radiomic features showing rotational invariance. Features with excellent (ICC > 0.9) rotational invariance in each feature class.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)
3814