The emerging field of
Materials: Our radiomics phantom consisted of 4 onions, 4 limes, 4 kiwifruits, and 4 apples (reflecting different tissue textures) placed on a box made out of Styrofoam (Figure 1).
Data acquisition: Phantom measurements were performed on a clinical 3T scanner (Ingenia, Philips Healthcare, Best, The Netherlands) with the built-in spine matrix coil as well as the standard body-matrix coil. Two different MRI sequences were acquired: 1) a fluid-attenuated inversion recovery (FLAIR) sequence (acquired voxel size 1.2x1.5x5.5[mm], reconstructed voxel size 0.45x0.45x5.5[mm], FOV 300x300x77, TE/TR=140/12000[ms], Flip angle 90°), and 2) a T2-weighted (T2w) sequence (acquired voxel size 0.8x1.0x5.5[mm], reconstructed voxel size 0.3x0.3x5.5[mm], FOV 300x300x77, TE/TR=80/2500[ms], Flip angle 90°). After repositioning of the phantom all sequences were repeated to acquire test/retest data.
Image segmentation: Image segmentation was performed semi-automatically using the 3D Slicer open-source platform (version 4.8; www.slicer.org)9. The most apical and basal slice of each fruit/vegetable as well as border zones between fruit/vegetable and surrounding air were manually excluded to account for partial volume artifacts. Radiomic feature extraction: Radiomic features were extracted using the user-friendly, multiplatform freeware LIFEx (version 4.00; www.lifexsoft.org)10. Prior to feature extraction, 27 different image processing settings/resampling steps were used (Table 1). A total of 45 radiomic features were extracted, corresponding to the following 6 different matrices/feature classes: histogram matrix, shape matrix, grey-level cooccurrence matrix, grey level run length matrix (GLRLM), neighboring grey level dependence matrix, and grey level zone length matrix.
Statistical analysis: Statistical analysis was performed in R (version3.4.0; R Foundation for Statistical Computing) with RStudio (version1.0.136). Concordance correlation coefficients (CCCs)11 were calculated to analyze test-retest-robustness. The dynamic range (DR) was calculated as previously described12,13, with values close to 1 implying that the feature has a large biological range with good reproducibility. In order to account for subtle intra-reader differences in image segmentation, CCCs were corrected as follows: CCCcorr=CCC+(1- intra-observer ICC). Excellent robustness was then defined as CCCcorr and DR≥0.90 as previously described12. Systematic differences of individual features between different resampling steps were assessed using one-way analysis of variance with Tukey-type post-hoc comparisons in order to adjust for multiple testing.
For both MRI sequences, the amount of robust radiomic features differed considerably depending upon the chosen image processing parameters (Figure 2). In general, the percentage of robust features tended to be higher for FLAIR than for T2w. For FLAIR imaging, image processing with spatial resampling of 1.25x1.25x1.25[mm], intensity discretization with 32 grey-levels and mean±3 standard deviations relative intensity rescaling delivered the highest percentage of robust features (n=34/45, 76%), whereas for T2w imaging, highest robustness (n=36/45, 80%) was achieved by using a spatial resampling of 1x1x1[mm], intensity discretization with 32 grey-levels and min<>max relative intensity rescaling as image processing settings.
Regarding systematic differences introduced by the variation of image processing parameters, histogram and shape features did not show significant differences between the different resampling steps, except for resampling steps leading to anisotropic voxels. In contrast, there were large and significant differences for nearly all other feature matrices, as exemplarily shown for GLRLM features (Figure 3).
Figure 1 Representing different signal intensities, shapes, and tissue textures, a total of 4 onions, 4 limes, 4 kiwifruits, and 4 apples were scanned within a box made out of Styrofoam as our radiomics phantom. Shown are exemplary images of the phantom (upper left), acquired with a T2w sequence (upper right), a FLAIR sequence (lower left) and after segmentation (lower right).
Figure 2 Percentage of robust features for low-resolution FLAIR (above) and low-resolution T2w (below) for each resampling step. Robustness is shown for cut-off CCC and DR≥0.90.
The percentage of robust features tended to be higher for FLAIR than for T2w. For FLAIR imaging, image processing with spatial resampling of 1.25x1.25x1.25[mm], intensity discretization with 32 grey-levels and mean±3 standard deviations relative intensity rescaling delivered the highest percentage of robust features (n=34/45,76%), whereas for T2w imaging, highest robustness (n=36/45,80%) was achieved by using a spatial resampling of 1x1x1[mm], intensity discretization with 32 grey-levels and min<>max relative intensity rescaling as image processing settings.
Figure 3 Reproducibility of grey level run length based texture features for all 2-27 resampling steps. Varying image processing parameters leads to considerable differences of the individual features.
GLRLM grey level run length matrix, SRE–Short runs emphasis, LRE–Long runs emphasis, LGRE–Low grey level run emphasis, HGRE–High grey level run emphasis, SRLGE–Short run low grew level emphasis, SRHGE–Short run high grey level emphasis, LRLGE–Long run low grey level emphasis, LRHGE–Long run high grey level emphasis, GLNU–Grey level non-uniformity, RLNU–Run length non-uniformity, RP–Run percentage.