↓ Skip to main content

Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate?

Overview of attention for article published in BMC Medical Research Methodology, August 2016
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age
  • Average Attention Score compared to outputs of the same age and source

Mentioned by

twitter
5 X users

Citations

dimensions_citation
228 Dimensions

Readers on

mendeley
314 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate?
Published in
BMC Medical Research Methodology, August 2016
DOI 10.1186/s12874-016-0200-9
Pubmed ID
Authors

Antonia Zapf, Stefanie Castell, Lars Morawietz, André Karch

Abstract

Reliability of measurements is a prerequisite of medical research. For nominal data, Fleiss' kappa (in the following labelled as Fleiss' K) and Krippendorff's alpha provide the highest flexibility of the available reliability measures with respect to number of raters and categories. Our aim was to investigate which measures and which confidence intervals provide the best statistical properties for the assessment of inter-rater reliability in different situations. We performed a large simulation study to investigate the precision of the estimates for Fleiss' K and Krippendorff's alpha and to determine the empirical coverage probability of the corresponding confidence intervals (asymptotic for Fleiss' K and bootstrap for both measures). Furthermore, we compared measures and confidence intervals in a real world case study. Point estimates of Fleiss' K and Krippendorff's alpha did not differ from each other in all scenarios. In the case of missing data (completely at random), Krippendorff's alpha provided stable estimates, while the complete case analysis approach for Fleiss' K led to biased estimates. For shifted null hypotheses, the coverage probability of the asymptotic confidence interval for Fleiss' K was low, while the bootstrap confidence intervals for both measures provided a coverage probability close to the theoretical one. Fleiss' K and Krippendorff's alpha with bootstrap confidence intervals are equally suitable for the analysis of reliability of complete nominal data. The asymptotic confidence interval for Fleiss' K should not be used. In the case of missing data or data or higher than nominal order, Krippendorff's alpha is recommended. Together with this article, we provide an R-script for calculating Fleiss' K and Krippendorff's alpha and their corresponding bootstrap confidence intervals.

X Demographics

X Demographics

The data shown below were collected from the profiles of 5 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 314 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Kenya 1 <1%
Unknown 313 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 58 18%
Student > Master 43 14%
Researcher 41 13%
Student > Bachelor 20 6%
Student > Doctoral Student 19 6%
Other 57 18%
Unknown 76 24%
Readers by discipline Count As %
Medicine and Dentistry 43 14%
Psychology 27 9%
Computer Science 19 6%
Social Sciences 19 6%
Nursing and Health Professions 17 5%
Other 89 28%
Unknown 100 32%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 3. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 03 March 2022.
All research outputs
#13,627,094
of 23,509,982 outputs
Outputs from BMC Medical Research Methodology
#1,290
of 2,074 outputs
Outputs of similar age
#199,300
of 369,277 outputs
Outputs of similar age from BMC Medical Research Methodology
#26
of 42 outputs
Altmetric has tracked 23,509,982 research outputs across all sources so far. This one is in the 41st percentile – i.e., 41% of other outputs scored the same or lower than it.
So far Altmetric has tracked 2,074 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.2. This one is in the 36th percentile – i.e., 36% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 369,277 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 45th percentile – i.e., 45% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 42 others from the same source and published within six weeks on either side of this one. This one is in the 35th percentile – i.e., 35% of its contemporaries scored the same or lower than it.