2433

Automatic extraction of reproducible semi-quantitative histological metrics for MRI-histology correlations

Daniel ZL Kor¹, Saad Jbabdi¹, Jeroen Mollink¹, Istvan N Huszar¹, Menuka Pallebage- Gamarallage², Adele Smart², Connor Scott², Olaf Ansorge², Amy FD Howard¹, and Karla L Miller¹
¹Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, United Kingdom, ²Neuropathology, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom

Synopsis

Immunohistochemistry (IHC) images are often used as a microscopic validation tool for MRI. Acquisition of MRI and IHC in the same ex-vivo tissue sample can enable direct correlation between MRI measures and purported sources of image contrast derived from IHC, ideally at the voxel level. However, most IHC analyses still involve manual intervention (e.g. setting of thresholds). Here, we describe an end-to-end pipeline for automatically extracting stained area fraction maps to quantify the IHC stain for a given microstructural feature. The pipeline has improved reproducibility and robustness to histology artefacts, compared to manual MRI-histology analyses that suffer from inter-operator bias.

Introduction

Comparison of MRI to stained tissue slides from the same sample provides insight into the cellular correlates of MRI signals, which are notoriously non-specific. Immunohistochemistry (IHC) is a histological staining technique that uses diaminobenzidine (DAB) to stain a target protein brown, and hematoxylin to counterstain background tissue purple. While many studies use IHC to validate MR measures, there is no standardised pipeline for extracting quantitative metrics from IHC slides^1,2,3.

The stained area fraction (SAF) is a common metric that counts how many pixels in a region correspond to the protein targeted by DAB. SAF is typically quantified by separating hematoxylin and DAB stains using literature-based colour information^4,5,6 before setting a manual threshold for the DAB channel to segment microstructural tissue compartments from non-specific background staining^4,5,7. These workflows suffer from issues that compromise robustness and reproducibility, including: inability to capture staining variation, poor robustness to histological artefacts, and inter-operator bias in setting the threshold.

Here, we propose an automated pipeline that receives input RGB images and outputs a SAF map. Key features are:1) a clustering approach to derive slide-specific colour information, 2) automated determination of the segmentation threshold, and 3) local thresholding to account for within-slide variations. We compare the pipeline’s reproducibility and robustness to previously reported histological quantification workflows.

Methods

Pipeline overview: The pipeline consists of four steps (Figure 1).

1: Slide-specific RGB colour vectors for both stains are derived via k-means clustering (k=2)⁹.

2: Stains are separated using colour deconvolution¹⁰ with non-negativity constraints.

3: The structure-specific DAB is segmented from non-specific background staining using an automatically determined threshold (outlined below).

4: SAFs, defined as the ratio of the cell mask’s positive pixels to the number of tissue pixels, are calculated in patches to form a map.

Thresholding methods: We compared three thresholding methods for Step 3. All cases are based on Otsu’s method¹¹ for separating foreground from background pixels.

1. Global thresholding estimates and applies a slide-specific threshold by applying Otsu to the slide’s DAB density histogram.

2. Local thresholding aims to account for variations in the DAB intensity across slides by adapting the threshold to local neighbourhoods. Local thresholds become more sensitive to histogram features, necessitating two refinements. First, we observed Otsu performs better after pre-filtering the background stain with a Gaussian-2-inverse-gamma mixture model¹². Second, Otsu fails on histograms which contain a single class (background only): these instances are detected using a weighted object variance¹³, and a slide-wide default threshold is used.

3. Batchwise thresholding is the most common manual method, where a trained operator optimises a threshold by eye on multiple slides before applying it to the entire dataset. We simulate this by using the median of all slides’ default thresholds.

Acquisition: We acquired an IHC dataset to test reproducibility, consisting of 26 slides from 7 human brains. For each brain, 3-4 adjacent slides (6 µm thick) were sectioned from the primary motor cortex (face region). Slides were stained for activated microglia (CD68) and scanned with a digital scanner (2.5x2.5cm²;0.5µm/pixel). Digitization of slides introduces vertical intensity variations. Consequently, our local method applies Otsu to column-wise histograms (32-pixels wide). Adjacent slides from the same subject were co-registered⁸ and compared pixelwise to assess the SAF maps’ reproducibility, under the assumption that adjacent slides are similar at the SAF map resolution (128-512μm).

Pipeline evaluation: The pipeline was evaluated for reproducibility and robustness. For all methods, absolute difference maps of SAF between co-registered adjacent slides were computed to evaluate reproducibility. Robustness to artefacts is measured with the coefficient of variation (COV, the standard deviation divided by the mean). With the strongest variation along the horizontal direction, we collapse the vertical dimension by calculating the column-wise average SAF before calculating the COV across this horizontal plot. We compare local thresholding to global and batch-wise thresholding using a COV ratio, where values >1 indicate reduced horizontal variation for the local method and an increased robustness to artefacts (both slow drift and vertical striping).

Results and Discussion

Figure 2 visually compares stain separation from literature-based and slide-specific colour information.

Figure 3 shows the absolute difference maps for all pairwise comparisons using local thresholding. The adjacent SAF maps are generally similar within the white matter (~20% error). The largest errors are at tissue edges, suggesting residual misalignment that may bias comparisons. This indicates good reproducibility.

Figure 4 shows that the median percentage change was observed to be around 30% for all methods. The local method produces the lowest variance, indicating a higher consistency in the difference of SAF maps. This suggests that the local method produces more alike SAF maps between adjacent slides.

Figure 5 highlights the COV ratio comparing local thresholding with both batchwise (dashed) and global (solid) thresholding. This COV ratio is >1 for 20 out of 26 slides, indicating that the COV is reliably lower when using local thresholding. This suggests reduced impact of artefacts in SAF maps when a local thresholding method is used.

Conclusion and Future Works

We have developed a fully-automated pipeline for generating SAF maps. Our results suggest some benefit in using local thresholds over conventional global thresholds. Future works will consider other stains targeting myelin, iron and neurofilaments, and their pixelwise correlation with MRI data⁷.

Acknowledgements

AFDH and KLM contributed equally to this work. Data acquired was funded by the grant MR/K02213X/1 from the Medical Research Council (MRC). DZLK is supported by the St. Peter’s Diggle Clarendon Scholarship and the Nuffield Department of Clinical Neurosciences studentship. AFDH and INH are supported by the EPSRC and MRC grants (grants EP/L016052/1 and MR/L009013/1). KLM and SJ are supported by the Wellcome Trust (grants WT202788/Z/16/A and WT215573/Z/19/Z). The Wellcome Centre for Integrative Neuroimaging is supported by core funding from the Wellcome Trust (203139/Z/16/Z).

References

1. Lazari A, Lipp I. Can MRI measure myelin? Systematic review, qualitative assessment, and meta-analysis of studies validating microstructural imaging with myelin histology. bioRxiv. Published online September 26, 2020:2020.09.08.286518. doi:10.1101/2020.09.08.286518

2. van der Weijden CWJ, García DV, Borra RJH, et al. Myelin quantification with MRI: A systematic review of accuracy and reproducibility. NeuroImage. 2021;226:117561. doi:10.1016/j.neuroimage.2020.117561

3. De Barros A, Arribarat G, Combis J, Chaynes P, Péran P. Matching ex vivo MRI With Iron Histology: Pearls and Pitfalls. Front Neuroanat. 2019;13. doi:10.3389/fnana.2019.00068

4. Wiggermann V, Hametner S, Hernández‐Torres E, et al. Susceptibility-sensitive MRI of multiple sclerosis lesions and the impact of normal-appearing white matter changes. NMR in Biomedicine. 2017;30(8):e3727. doi:https://doi.org/10.1002/nbm.3727

5. Bagnato F, Hametner S, Boyd E, et al. Untangling the R2* contrast in multiple sclerosis: A combined MRI-histology study at 7.0 Tesla. PLOS ONE. 2018;13(3):e0193839. doi:10.1371/journal.pone.0193839

6. Hametner S, Endmayr V, Deistung A, et al. The influence of brain iron and myelin on magnetic susceptibility and effective transverse relaxation - A biochemical and histological validation study. Neuroimage. 2018;179:117-133. doi:10.1016/j.neuroimage.2018.06.007

7. Pallebage-Gamarallage M, Foxley S, Menke RAL, et al. Dissecting the pathobiology of altered MRI signal in amyotrophic lateral sclerosis: A post mortem whole brain sampling strategy for the integration of ultra-high-field MRI and quantitative neuropathology. BMC Neurosci. 2018;19(1):11. doi:10.1186/s12868-018-0416-1

8. Huszar IN, Pallebage-Gamarallage M, Foxley S, et al. Tensor Image Registration Library: Automated Non-Linear Registration of Sparsely Sampled Histological Specimens to Post-Mortem MRI of the Whole Human Brain. bioRxiv. Published online November 26, 2019:849570. doi:10.1101/849570

9. Geijs DJ, Intezar M, Laak JAWM van der, Litjens GJS. Automatic color unmixing of IHC stained whole slide images. In: Medical Imaging 2018: Digital Pathology. Vol 10581. International Society for Optics and Photonics; 2018:105810L. doi:10.1117/12.22937341

10. Ruifrok AC, Johnston DA. Quantification of histochemical staining by color deconvolution. Anal Quant Cytol Histol. 2001;23(4):291-299.

11. Otsu, Nobuyuki. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics. 1979; 9 (1): 62–66. https://doi.org/10.1109/TSMC.1979.4310076.

12. Llera A, Vidaurre D, Pruim RHR, Beckmann CF. Variational Mixture Models with Gamma or inverse-Gamma components. arXiv:160707573 [stat]. Published online July 26, 2016. Accessed December 9, 2020. http://arxiv.org/abs/1607.07573

13. Yuan X, Wu L, Peng Q. An improved Otsu method using the weighted object variance for defect detection. Applied Surface Science. 2015;349:472-484. doi:10.1016/j.apsusc.2015.05.033

Figures

Fig. 1: A robust, automatic pipeline to quantify the stained area fraction (SAF) from histology slides as highlighted in the 4 steps. Input RGB slides are processed to produce SAF maps at variable resolution. We aim to correlate SAF maps at MRI-scale resolution (512x512 µm²/pixel) to MRI measures.

Fig. 2: DAB channel density maps produced from the RGB patches (left column) following stain separation with literature-based (middle column) and slide-derived (right column) colour information. The literature values produce poorer stain separation, with structures (red arrows) targetted by hematoxylin-nuclei, blood vessels (rows 1,2) and oligodendrocytes (rows 3,4)-appearing in the DAB channel. Although slide-specific values show improved specificity in excluding these structures, thresholding is necessary to remove those not fully excluded (rows 3,4; right column).

Fig. 3: An example SAF map (top row) and the absolute percent difference maps for all within-subject pairwise comparisons between adjacent slides with local thresholding (i.e. proposed pipeline). Each column shows a different subject. 3 adjacent slides produce 3 pairwise comparisons (subjects 6,7), while 4 adjacent slides produce 6 pairwise comparisons (other subjects). For almost all subjects, the highest percentage difference is found on the edges of the tissue, implying possible misalignment after co-registration or reduced robustness to tissue edge artefacts.

Fig. 4: Reproducibility is measured as the median of the absolute percent difference (Figure 2) for each within-subject pairwise comparisons between adjacent slides. The total median and median absolute is computed (legends), with a median of ~30% for all methods. The higher median absolute difference in batchwise and global methods is driven by subjects 1 and 3 respectively. The reason for this warrants more investigation, but is likely related to histology artefacts such as within-slide staining inconsistencies and/or intensity variations from slide digitisation.

Fig. 5: The ratio of coefficient of variation (COV) between different methods is a relative measure of robustness to histological slides artefacts. The local to batchwise method COV ratio (dashed) and local to global method COV ratio (solid) is observed to >1 for 20 out of 26 slides. This suggests an overall reduction in the COV of SAF maps produced by the local method, and implies an increased robustness to artefacts that cause artificial local changes in stain intensities.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

2433