Altmetric – Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate?

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate?
Published in	BMC Medical Research Methodology, August 2016
DOI	10.1186/s12874-016-0200-9
Pubmed ID	27495131
Authors	Antonia Zapf, Stefanie Castell, Lars Morawietz, André Karch
Abstract	Reliability of measurements is a prerequisite of medical research. For nominal data, Fleiss' kappa (in the following labelled as Fleiss' K) and Krippendorff's alpha provide the highest flexibility of the available reliability measures with respect to number of raters and categories. Our aim was to investigate which measures and which confidence intervals provide the best statistical properties for the assessment of inter-rater reliability in different situations. We performed a large simulation study to investigate the precision of the estimates for Fleiss' K and Krippendorff's alpha and to determine the empirical coverage probability of the corresponding confidence intervals (asymptotic for Fleiss' K and bootstrap for both measures). Furthermore, we compared measures and confidence intervals in a real world case study. Point estimates of Fleiss' K and Krippendorff's alpha did not differ from each other in all scenarios. In the case of missing data (completely at random), Krippendorff's alpha provided stable estimates, while the complete case analysis approach for Fleiss' K led to biased estimates. For shifted null hypotheses, the coverage probability of the asymptotic confidence interval for Fleiss' K was low, while the bootstrap confidence intervals for both measures provided a coverage probability close to the theoretical one. Fleiss' K and Krippendorff's alpha with bootstrap confidence intervals are equally suitable for the analysis of reliability of complete nominal data. The asymptotic confidence interval for Fleiss' K should not be used. In the case of missing data or data or higher than nominal order, Krippendorff's alpha is recommended. Together with this article, we provide an R-script for calculating Fleiss' K and Krippendorff's alpha and their corresponding bootstrap confidence intervals.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 5 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United Kingdom	2	40%
Canada	1	20%
Australia	1	20%
Unknown	1	20%

Demographic breakdown

Type	Count	As %
Members of the public	3	60%
Scientists	1	20%
Practitioners (doctors, other healthcare professionals)	1	20%

Mendeley readers

The data shown below were compiled from readership statistics for 314 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Kenya	1	<1%
Unknown	313	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	58	18%
Student > Master	43	14%
Researcher	41	13%
Student > Bachelor	20	6%
Student > Doctoral Student	19	6%
Other	57	18%
Unknown	76	24%

Readers by discipline	Count	As %
Medicine and Dentistry	43	14%
Psychology	27	9%
Computer Science	19	6%
Social Sciences	19	6%
Nursing and Health Professions	17	5%
Other	89	28%
Unknown	100	32%

Attention Score in Context

This research output has an Altmetric Attention Score of 3. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 03 March 2022.

All research outputs

#13,627,094

of 23,509,982 outputs

Outputs from BMC Medical Research Methodology

#1,290

of 2,074 outputs

Outputs of similar age

#199,300

of 369,277 outputs

Outputs of similar age from BMC Medical Research Methodology

#26

of 42 outputs

Altmetric has tracked 23,509,982 research outputs across all sources so far. This one is in the 41st percentile – i.e., 41% of other outputs scored the same or lower than it.

So far Altmetric has tracked 2,074 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.2. This one is in the 36th percentile – i.e., 36% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 369,277 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 45th percentile – i.e., 45% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 42 others from the same source and published within six weeks on either side of this one. This one is in the 35th percentile – i.e., 35% of its contemporaries scored the same or lower than it.

Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate?

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context