1836

Interobserver Agreement and Accuracy of Six Scoring Systems(Likert, PI-RADS V1, PI-RADS V2, MLS, SQS, UCSF)for Prostate Lesions by Using Mp-MR imaging
Li Zhang1, Longchao Li1, Jin Zhang2, and Jianfeng Li2

1The department of MRI, Shaanxi People’s Hospital, xi'an, China, 2MRI, Shaanxi People’s Hospital, xi'an, China

Synopsis

Some institutions have used other prostate scoring systems except for PI-RADS. The purpose of this study was to determine for expert and novice radiologists the agreement and accuracy of six scoring systems for categorization of prostate lesions seen at mp-MRI. 129 lesions were scored for four readers. Experts and novices had fair to moderate agreement for most scores (k: 0.2176~ 0.4533). Novices were less consistent and less likely to diagnose prostate cancer than were experts. The Likert and PI-RADS V1 scores allowed significantly more accurate categorization of prostate lesions than others.

Introduction:

In addition to PI-RADS scoring system, some institutions have used other prostate scoring systems based on mp-MRI (ie, Morphology-Location- Signal Intensity (MLS), Simplified Qualitative System (SQS), Likert, and a modified version of the PI-RADS which utilized at the University of California, San Francisco (UCSF)). Although PI-RADS may indeed be a very good way to interpret MR imaging of prostate, other scoring systems may show better performance with a high degree of inter reader agreement. To best identify potential weaknesses and provide the best information for future PI-RADS updates, such assessments would well evaluate the agreement and accuracy not only of the overall suspected scores, but also of the vocabulary for assigning these six scoring systems.

Materials and Methods:

This study was approved by the Ethics Committee of Shaanxi People’s Hospital, Xi’an, China. 101 consecutive patients with 129 lesions were scored by four readers(two expert and two novice radiologists) who were blinded to pathologic results by using the six scoring systems. Interobserver agreement was evaluated by using k statistics, and the diagnostic performance of the six scoring systems was evaluated by using sensitivity and specificity.

Results:

Overall interreader agreement was fair for Likert, PI-RADS V2, SQS, UCSF (k: 0.2176~ 0.4533), slight agreement for MLS and PI-RADS V1(k:0.139~0.158). Agreement for measured extracapsular extension, size, b=2000, DCE III curve were moderate or substantial agreement (range, 0.5224 ~0.6021). In the PZ, agreement was moderate to substantial for features related to T2WI (range, 0.4886 ~0.6874) for most scores by experts, which better than TZ (range, 0.2017~0.5545). Experts agreed significantly more than did novices and were significantly more likely than were novices to assign a diagnosis of prostate cancer(P<0.001). Sensitivity and specificity of the Likert were 0.98, 0.67, PI-RADS V1(0.93, 0.67), PI-RADS V2(0.98.0.59), MLS (1,0.56), SQS(0.98,0.59), UCSF(0.96,0.64).

Discussion:

There is substantial variation in prostate lesions reporting by both experts and novices when standardized reporting schema are used. Although experts consistently had significantly higher agreement than novices, only experts who used Likert had well interreader agreement than others. The remaining scoring system and experience level combinations had agreement that was only fair to moderate. The Likert score, despite its subjective nature, showed good diagnostic performance for readers.

Conclusion:

Experts and novices had fair to moderate agreement for most scores. Novices were less consistent and less likely to diagnose prostate cancer than were experts. Agreement tended to be better in PZ than TZ. The Likert and PI-RADS V1 scores allowed significantly more accurate categorization of prostate lesions than others. The findings may help guide future PI-RADS lexicon updates.

Acknowledgements

No acknowledgement found.

References

No reference found.
Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)
1836