Large multi-center studies are needed to realize the utilization of quantitative MRI (qMRI) as a biomarker for cervical cancer. In this study we created a framework for a multi-center imaging biomarker study, maximizing the consistency between quantitative results in the presence of a large variety of MRI systems. This way, large deviations in qMRI values can be detected and corrected before enrolment of patients in a study. Furthermore, these results can be used to determine the statistical power of the study.
Measurements were performed in ten institutes (four 1.5T systems, six 3T systems from three vendors) prior to the inclusion of the first patient. For the trial sequences requirements were defined (Table 1) and acceleration techniques were allowed. The parameters of the benchmark sequences were specified in detail such that they were similar on all systems. Both benchmark and trial sequences were repeated to assess repeatability.
For diffusion weighted imaging (DWI) we used the Diffusion Phantom Model 128 (High Precision Device, Inc, Boulder, Colorado, USA). As a benchmark sequence we used the sequence specified in the phantom’s manual4. T2 mapping was assessed with the Eurospin II TO5 phantom (Diagnostic Sonar LTD, Livingston, Scotland). As a benchmark sequence we used a single slice non-accelerated multi-echo spin-echo sequence. For pharmacokinetic modelling with dynamic contrast-enhanced (DCE-) MRI data the Quantitative Imaging Biomarker Alliance (QIBA) suggests to assess signal stability and linearity of the DCE sequence and the accuracy of baseline T1 mapping5. For the first two aspects, we created a phantom consisting of ten samples with gadolinium concentrations from 0 to 10 mM. For evaluation of T1 mapping, the Eurospin II TO5 phantom was used. A single-slice, non-accelerated, inversion recovery series was applied as a benchmark sequence.
Bland-Altman statistics was used to calculate the bias and 95% confidence intervals (CI) for the measured qMRI parameters. Short-term repeatability was expressed as the within-subject standard deviation (wCV) for repeated measurements. For signal stability of the DCE sequence we calculated the CV as the standard deviation of signal intensities of all dynamic scans divided by the mean. For signal linearity we converted the measured signal intensities to estimated concentration values6 and compared the values to the true values.
Although variation in sequence choice was allowed, in general the same base sequence was used: an EPI sequence for ADC mapping; multi-echo spin echo for T2 mapping, except in one institute a series of separate T2-weighted images with different TEs was acquired; a variable flip angle approach for T1 mapping; and a spoiled-gradient echo (with or without Dixon) for DCE-MRI.
The bias in ADC measurements of both benchmark and trial sequences were within the limit of ≤ 40*10-6 mm2/s of the QIBA profile4 for all institutes (Fig. 1). The median short-term repeatability of the trial sequence was 0.3% (range 0.0 – 0.7%).
The variation in T2 values between the institutes was the same for the benchmark and trial sequence (Fig. 2), illustrating that the protocol variations between the centers did not affect the T2 values. Median short-term repeatability of the trial sequence was 0.4% (range 0.3 – 1.2%).
The benchmark sequence for T1 mapping gave consistent results across all institutes: mean bias = 9 ms (CI= -74 - 92 ms) (Fig. 3). The differences with the trial sequence were larger: mean bias = 52 ms (CI= -562 - 666 ms). The results at one institute were different from the others, which was corrected after another iteration of new sequence optimization. Median short-term repeatability was 0.6% (range 0.5 – 1.5%).
The CV for the signal stability for the DCE sequence was 0.4% (range 0.0 – 3.5%). The measured concentrations were linear up to 0.5 mM in all institutes (Fig. 4).