Michael Goetz1, Christian Weber1, and Klaus H. Maier-Hein1
1Medical Image Computing, German Cancer Research Center (DKFZ) Heidelberg, Heidelberg, Germany
Synopsis
To compare
the information content of different MR sequences often regions of interests (ROI)
are used. A way to avoid observer-dependencies is to use ROIs from different
observer, but this leads to the question how to fuse them. We propose a new
method to combine the information obtained by multiple ROIs depending on their
similarity, making our method less sensitive to outlier. We evaluate our method
by comparing the results obtained from the traditional merging method with the
proposed algorithm. The results indicate that our method can be a valuable
extension to ROI-based, multi-observer studies.Motivation
The
diagnostic value of different MR sequences and other imaging modalities is usually
assessed by region-of-interest (ROI) based comparisons. To avoid the dependency
on a single observer and to improve the reliability the results, often ROIs
from multiple observers are used. This allows reporting a mean result which is
not based on a single observer and an inter-rater reliability as done by Davenport
et al., for example [1, 2].
If multiple
ROIs within a single image return different results, the remaining question is
which ROI is the best ROI. A well-established solution is to report the mean
value of all ROIs within a single image. But this weights all ROIs equally,
even though some might be placed by more experienced observer than others and some
may be considered as outliers. Therefore, the mean might not reflect the best
solution, as for example shown for manual segmentations [3]. There are some modality-specific
solutions to avoid the observer-dependency like TBSS and Partial Volume
Analysis [4, 5].
We
propose a new method of combining ROIs of multiple observers to simulate
an optimal common ROI. Our method weights the result of each observer based on the
agreement of all data, therefore reducing the effects of outliers.
Method
Our method takes n different ROIs $$$R_i$$$ and estimates a single target ROI $$$R_t=\sum_n w_i\cdot R_i$$$. The influence of the ROIs is done by weighting each voxel with an ROI-specific weight $$$w_i$$$. The so-created ROI is initialized as a mean ROI as all weights are set to $$$\frac{1}{n}$$$. Taking this as starting condition the weights are iteratively updated by our method. For this, a density function $$$D_i$$$ is estimated from each ROI and the distance $$$d_i$$$ of all $$$R_i$$$ from $$$R_t$$$ is calculated as
$$d_i = \sum_x \left| D_i(x) - D_t(x) \right|\enspace .$$
The density functions are evaluated for a fixed range of points, which are equally distributed over the complete observation range. The sum is calculated over the complete range of all those points. After calculating all $$$d_i$$$, the weights are updated according to
$$ w_i = \frac{\frac{1}{d_i}}{\sum \frac{1}{d_i}}\enspace.$$
This gives ROIs that are more similar to the target distribution more weight and increases the overall similarity as the influence of outliers is reduced. The reweighting is repeated until the change of all weights is below a given threshold. Figure 1 visualize the process of the algorithm.
The evaluation of our method is done using a group of 18 patients with high grade glioma. Seven ROIs are placed within edema based on T2-weighted MR-images by three different observers and the mean diffusivity calculated from a co-registered diffusion-weighted MR-image.
We implemented our method using the R programming language. The density estimation is done using the parameter-freefunction sm.density().
Results
Figure 2 shows displays the benefit of our method by using simulated data.
Figure 3
shows the qualitative result of our approach. Displayed is the distribution of
the grey values marked by the ROIs of three observers, the resulting mean
distribution, and the result of the proposed approach.
Running our
algorithm on the images of all 18 patients took less than half a second per
patient (mean duration $$$0.05\,s\pm0.003\,s$$$). It took $$$11.7 \pm 4.9$$$ iterations on average
until the algorithm converged (threshold for converging was set to 0.001). The
distribution of the calculated weights is shown in Fig. 4. The differences of
mean between the observer-generated ROIs and the proposed method was 84 times
lower than those between the observer-generated ROIs and the mean ROI and 42 times the other way round. Figure 5 depicts the distribution of the mean
grey values for all observers and the two fusing approaches.
The difference
between the mean of all mean values for all observers and the mean value of and proposed
approach are $$$2.03 \cdot 10^-5$$$ and $$$-9.23 \cdot 10^-6$$$ respectively. The same differences for the
median are $$$4.32 \cdot 10^-5$$$ and $$$8.30 \cdot 10^-6$$$, respectively.
Discussion
We proposed
a new method that allows a new unification of ROIs by weighting the
observations from different raters. We think that this will enable more
representative results which are less influenced by outliers, as our initial results indicate that the proposed
method does reflect the real data by reducing the influence of an unusual
observation, leading to a result that is less sensitive to a single rater.
While we do not think that using only the results of our method is sufficient,
we think that it can reveal additional information if it is used in multi-rater
studies.
Acknowledgements
This work was carried out wth the support of the German Research Foundation DFG within projects I04 and R01, SFB/TRRR 125 "Cognition-Guided Surgery".References
[1] Hallgren
KA. Computing Inter-Rater Reliability for Observational Data: An Overview and
Tutorial. Tutorials in quantitative methods for psychology. 2012;8(1):23-34.
[2] Davenport, M. S., Heye, T., Dale, B. M.,
Horvath, J. J., Breault, S. R., Feuerlein, S., Bashir, M. R., Boll, D. T. and
Merkle, E. M. (2013), Inter- and intra-rater reproducibility of quantitative
dynamic contrast enhanced MRI using TWIST perfusion data in a uterine fibroid
model. J. Magn. Reson. Imaging, 38: 329–335. doi: 10.1002/jmri.23974
[3] Warfield,
S.K.; Zou, K.H.; Wells, W.M., "Simultaneous truth and performance level
estimation (STAPLE): an algorithm for the validation of image
segmentation," in Medical Imaging, IEEE Transactions on , vol.23, no.7,
pp.903-921, July 2004 doi: 10.1109/TMI.2004.828354
[4] S. M.
Smith, M. Jenkinson, H. Johansen-Berg, D. Rueckert, T. E. Nichols, C. E.
Mackay, K. E. Watkins, O. Ciccarelli, M. Z. Cader, P. M. Matthews, T. E. J.
Behrens: Tract-based spatial statistics: Voxelwise analysis of multi-subject
diffusion data. In: NeuroImage. 31(4):1487–1505, 2006
[5] Diffusion
tensor imaging in primary brain tumors: reproducible quantitative analysis of
corpus callosum infiltration and contralateral involvement using a
probabilistic mixture model. Stieltjes B, Schlüter M, Didinger B, Weber
MA, Hahn HK, Parzer P, Rexilius J, Konrad-Verse O, Peitgen HO, Essig M.
Neuroimage. 2006
Jun;31(2):531-42. Epub 2006 Feb 14. PMID: 16478665