Report for: A critical evaluation of the validity and the reliability of global competency constructs for supervisor assessment of junior medical trainees

Title	A critical evaluation of the validity and the reliability of global competency constructs for supervisor assessment of junior medical trainees
Published in	Advances in Health Sciences Education, October 2012
DOI	10.1007/s10459-012-9410-z
Pubmed ID	23053869
Authors	D. A. McGill, C. P. M. van der Vleuten, M. J. Clarke
Abstract	Supervisor assessments are critical for both formative and summative assessment in the workplace. Supervisor ratings remain an important source of such assessment in many educational jurisdictions even though there is ambiguity about their validity and reliability. The aims of this evaluation is to explore the: (1) construct validity of ward-based supervisor competency assessments; (2) reliability of supervisors for observing any overarching domain constructs identified (factors); (3) stability of factors across subgroups of contexts, supervisors and trainees; and (4) position of the observations compared to the established literature. Evaluated assessments were all those used to judge intern (trainee) suitability to become an unconditionally registered medical practitioner in the Australian Capital Territory, Australia in 2007-2008. Initial construct identification is by traditional exploratory factor analysis (EFA) using Principal component analysis with Varimax rotation. Factor stability is explored by EFA of subgroups by different contexts such as hospital type, and different types of supervisors and trainees. The unit of analysis is each assessment, and includes all available assessments without aggregation of any scores to obtain the factors. Reliability of identified constructs is by variance components analysis of the summed trainee scores for each factor and the number of assessments needed to provide an acceptably reliable assessment using the construct, the reliability unit of analysis being the score for each factor for every assessment. For the 374 assessments from 74 trainees and 73 supervisors, the EFA resulted in 3 factors identified from the scree plot, accounting for only 68 % of the variance with factor 1 having features of a "general professional job performance" competency (eigenvalue 7.630; variance 54.5 %); factor 2 "clinical skills" (eigenvalue 1.036; variance 7.4 %); and factor 3 "professional and personal" competency (eigenvalue 0.867; variance 6.2 %). The percent trainee score variance for the summed competency item scores for factors 1, 2 and 3 were 40.4, 27.4 and 22.9 % respectively. The number of assessments needed to give a reliability coefficient of 0.80 was 6, 11 and 13 respectively. The factor structure remained stable for subgroups of female trainees, Australian graduate trainees, the central hospital, surgeons, staff specialist, visiting medical officers and the separation into single years. Physicians as supervisors, male trainees, and male supervisors all had a different grouping of items within 3 factors which all had competency items that collapsed into the predefined "face value" constructs of competence. These observations add new insights compared to the established literature. For the setting, most supervisors appear to be assessing a dominant construct domain which is similar to a general professional job performance competency. This global construct consists of individual competency items that supervisors spontaneously align and has acceptable assessment reliability. However, factor structure instability between different populations of supervisors and trainees means that subpopulations of trainees may be assessed differently and that some subpopulations of supervisors are assessing the same trainees with different constructs than other supervisors. The lack of competency criterion standardisation of supervisors' assessments brings into question the validity of this assessment method as currently used.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Unknown	1	100%

Demographic breakdown

Type	Count	As %
Members of the public	1	100%

Mendeley readers

The data shown below were compiled from readership statistics for 101 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
United Kingdom	1	<1%
Netherlands	1	<1%
United States	1	<1%
Unknown	98	97%

Demographic breakdown

Readers by professional status	Count	As %
Student > Master	15	15%
Student > Ph. D. Student	11	11%
Researcher	10	10%
Student > Doctoral Student	10	10%
Lecturer > Senior Lecturer	8	8%
Other	32	32%
Unknown	15	15%

Readers by discipline	Count	As %
Medicine and Dentistry	35	35%
Social Sciences	18	18%
Psychology	7	7%
Nursing and Health Professions	5	5%
Business, Management and Accounting	5	5%
Other	12	12%
Unknown	19	19%

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 01 November 2013.

All research outputs

#18,353,475

of 22,729,647 outputs

Outputs from Advances in Health Sciences Education

#745

of 851 outputs

Outputs of similar age

#130,817

of 172,365 outputs

Outputs of similar age from Advances in Health Sciences Education

#10

of 13 outputs

Altmetric has tracked 22,729,647 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.

So far Altmetric has tracked 851 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.7. This one is in the 3rd percentile – i.e., 3% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 172,365 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 11th percentile – i.e., 11% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 13 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

A critical evaluation of the validity and the reliability of global competency constructs for supervisor assessment of junior medical trainees

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context