↓ Skip to main content

Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment

Overview of attention for article published in BMC Medical Education, April 2018
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (82nd percentile)
  • High Attention Score compared to outputs of the same age and source (80th percentile)

Mentioned by

twitter
19 X users
facebook
1 Facebook page

Citations

dimensions_citation
12 Dimensions

Readers on

mendeley
59 Mendeley
citeulike
1 CiteULike
Title
Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment
Published in
BMC Medical Education, April 2018
DOI 10.1186/s12909-018-1143-0
Pubmed ID
Authors

David Hope, Karen Adamson, I. C. McManus, Liliana Chis, Andrew Elder

Abstract

Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential causes of such differences exist, and examinations must be routinely analysed to ensure they do not present inappropriate progression barriers for any candidate group. By analysing the individual questions of an examination through techniques such as Differential Item Functioning (DIF), we can test whether a subset of unfair questions explains group-level differences. Such items can then be revised or removed. We used DIF to investigate fairness for 13,694 candidates sitting a major international summative postgraduate examination in internal medicine. We compared (a) ethnically white UK graduates against ethnically non-white UK graduates and (b) male UK graduates against female UK graduates. DIF was used to test 2773 questions across 14 sittings. Across 2773 questions eight (0.29%) showed notable DIF after correcting for multiple comparisons: seven medium effects and one large effect. Blinded analysis of these questions by a panel of clinician assessors identified no plausible explanations for the differences. These questions were removed from the question bank and we present them here to share knowledge of questions with DIF. These questions did not significantly impact the overall performance of the cohort. Group-level differences in performance between the groups we studied in this examination cannot be explained by a subset of unfair questions. DIF helps explore fairness in assessment at the question level. This is especially important in high-stakes assessment where a small number of unfair questions may adversely impact the passing rates of some groups. However, very few questions exhibited notable DIF so differences in passing rates for the groups we studied cannot be explained by unfairness at the question level.

X Demographics

X Demographics

The data shown below were collected from the profiles of 19 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 59 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 59 100%

Demographic breakdown

Readers by professional status Count As %
Student > Master 8 14%
Student > Ph. D. Student 8 14%
Other 6 10%
Researcher 6 10%
Lecturer 5 8%
Other 10 17%
Unknown 16 27%
Readers by discipline Count As %
Medicine and Dentistry 10 17%
Social Sciences 7 12%
Psychology 5 8%
Linguistics 3 5%
Computer Science 3 5%
Other 13 22%
Unknown 18 31%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 12. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 March 2019.
All research outputs
#2,574,942
of 23,035,022 outputs
Outputs from BMC Medical Education
#408
of 3,370 outputs
Outputs of similar age
#56,010
of 329,113 outputs
Outputs of similar age from BMC Medical Education
#16
of 81 outputs
Altmetric has tracked 23,035,022 research outputs across all sources so far. Compared to these this one has done well and is in the 88th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 3,370 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 6.4. This one has done well, scoring higher than 87% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 329,113 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 82% of its contemporaries.
We're also able to compare this research output to 81 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 80% of its contemporaries.