↓ Skip to main content

A critical evaluation of the validity and the reliability of global competency constructs for supervisor assessment of junior medical trainees

Overview of attention for article published in Advances in Health Sciences Education, October 2012
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
14 Dimensions

Readers on

mendeley
101 Mendeley
Title
A critical evaluation of the validity and the reliability of global competency constructs for supervisor assessment of junior medical trainees
Published in
Advances in Health Sciences Education, October 2012
DOI 10.1007/s10459-012-9410-z
Pubmed ID
Authors

D. A. McGill, C. P. M. van der Vleuten, M. J. Clarke

Abstract

Supervisor assessments are critical for both formative and summative assessment in the workplace. Supervisor ratings remain an important source of such assessment in many educational jurisdictions even though there is ambiguity about their validity and reliability. The aims of this evaluation is to explore the: (1) construct validity of ward-based supervisor competency assessments; (2) reliability of supervisors for observing any overarching domain constructs identified (factors); (3) stability of factors across subgroups of contexts, supervisors and trainees; and (4) position of the observations compared to the established literature. Evaluated assessments were all those used to judge intern (trainee) suitability to become an unconditionally registered medical practitioner in the Australian Capital Territory, Australia in 2007-2008. Initial construct identification is by traditional exploratory factor analysis (EFA) using Principal component analysis with Varimax rotation. Factor stability is explored by EFA of subgroups by different contexts such as hospital type, and different types of supervisors and trainees. The unit of analysis is each assessment, and includes all available assessments without aggregation of any scores to obtain the factors. Reliability of identified constructs is by variance components analysis of the summed trainee scores for each factor and the number of assessments needed to provide an acceptably reliable assessment using the construct, the reliability unit of analysis being the score for each factor for every assessment. For the 374 assessments from 74 trainees and 73 supervisors, the EFA resulted in 3 factors identified from the scree plot, accounting for only 68 % of the variance with factor 1 having features of a "general professional job performance" competency (eigenvalue 7.630; variance 54.5 %); factor 2 "clinical skills" (eigenvalue 1.036; variance 7.4 %); and factor 3 "professional and personal" competency (eigenvalue 0.867; variance 6.2 %). The percent trainee score variance for the summed competency item scores for factors 1, 2 and 3 were 40.4, 27.4 and 22.9 % respectively. The number of assessments needed to give a reliability coefficient of 0.80 was 6, 11 and 13 respectively. The factor structure remained stable for subgroups of female trainees, Australian graduate trainees, the central hospital, surgeons, staff specialist, visiting medical officers and the separation into single years. Physicians as supervisors, male trainees, and male supervisors all had a different grouping of items within 3 factors which all had competency items that collapsed into the predefined "face value" constructs of competence. These observations add new insights compared to the established literature. For the setting, most supervisors appear to be assessing a dominant construct domain which is similar to a general professional job performance competency. This global construct consists of individual competency items that supervisors spontaneously align and has acceptable assessment reliability. However, factor structure instability between different populations of supervisors and trainees means that subpopulations of trainees may be assessed differently and that some subpopulations of supervisors are assessing the same trainees with different constructs than other supervisors. The lack of competency criterion standardisation of supervisors' assessments brings into question the validity of this assessment method as currently used.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 101 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United Kingdom 1 <1%
Netherlands 1 <1%
United States 1 <1%
Unknown 98 97%

Demographic breakdown

Readers by professional status Count As %
Student > Master 15 15%
Student > Ph. D. Student 11 11%
Researcher 10 10%
Student > Doctoral Student 10 10%
Lecturer > Senior Lecturer 8 8%
Other 32 32%
Unknown 15 15%
Readers by discipline Count As %
Medicine and Dentistry 35 35%
Social Sciences 18 18%
Psychology 7 7%
Nursing and Health Professions 5 5%
Business, Management and Accounting 5 5%
Other 12 12%
Unknown 19 19%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 01 November 2013.
All research outputs
#18,353,475
of 22,729,647 outputs
Outputs from Advances in Health Sciences Education
#745
of 851 outputs
Outputs of similar age
#130,817
of 172,365 outputs
Outputs of similar age from Advances in Health Sciences Education
#10
of 13 outputs
Altmetric has tracked 22,729,647 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.
So far Altmetric has tracked 851 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.7. This one is in the 3rd percentile – i.e., 3% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 172,365 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 11th percentile – i.e., 11% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 13 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.