Report for: Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment

Title	Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment
Published in	BMC Medical Education, April 2018
DOI	10.1186/s12909-018-1143-0
Pubmed ID	29615016
Authors	David Hope, Karen Adamson, I. C. McManus, Liliana Chis, Andrew Elder
Abstract	Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential causes of such differences exist, and examinations must be routinely analysed to ensure they do not present inappropriate progression barriers for any candidate group. By analysing the individual questions of an examination through techniques such as Differential Item Functioning (DIF), we can test whether a subset of unfair questions explains group-level differences. Such items can then be revised or removed. We used DIF to investigate fairness for 13,694 candidates sitting a major international summative postgraduate examination in internal medicine. We compared (a) ethnically white UK graduates against ethnically non-white UK graduates and (b) male UK graduates against female UK graduates. DIF was used to test 2773 questions across 14 sittings. Across 2773 questions eight (0.29%) showed notable DIF after correcting for multiple comparisons: seven medium effects and one large effect. Blinded analysis of these questions by a panel of clinician assessors identified no plausible explanations for the differences. These questions were removed from the question bank and we present them here to share knowledge of questions with DIF. These questions did not significantly impact the overall performance of the cohort. Group-level differences in performance between the groups we studied in this examination cannot be explained by a subset of unfair questions. DIF helps explore fairness in assessment at the question level. This is especially important in high-stakes assessment where a small number of unfair questions may adversely impact the passing rates of some groups. However, very few questions exhibited notable DIF so differences in passing rates for the groups we studied cannot be explained by unfairness at the question level.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 19 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United Kingdom	10	53%
United States	1	5%
Comoros	1	5%
Spain	1	5%
Unknown	6	32%

Demographic breakdown

Type	Count	As %
Members of the public	12	63%
Scientists	3	16%
Practitioners (doctors, other healthcare professionals)	3	16%
Science communicators (journalists, bloggers, editors)	1	5%

Mendeley readers

The data shown below were compiled from readership statistics for 59 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	59	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Master	8	14%
Student > Ph. D. Student	8	14%
Other	6	10%
Researcher	6	10%
Lecturer	5	8%
Other	10	17%
Unknown	16	27%

Readers by discipline	Count	As %
Medicine and Dentistry	10	17%
Social Sciences	7	12%
Psychology	5	8%
Linguistics	3	5%
Computer Science	3	5%
Other	13	22%
Unknown	18	31%

Attention Score in Context

This research output has an Altmetric Attention Score of 12. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 March 2019.

All research outputs

#2,574,942

of 23,035,022 outputs

Outputs from BMC Medical Education

#408

of 3,370 outputs

Outputs of similar age

#56,010

of 329,113 outputs

Outputs of similar age from BMC Medical Education

#16

of 81 outputs

Altmetric has tracked 23,035,022 research outputs across all sources so far. Compared to these this one has done well and is in the 88th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.

So far Altmetric has tracked 3,370 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 6.4. This one has done well, scoring higher than 87% of its peers.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 329,113 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 82% of its contemporaries.

We're also able to compare this research output to 81 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 80% of its contemporaries.

Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context