A Comparison of Standardized and Narrative Letters of Recommendation

Download A Comparison of Standardized and Narrative Letters of Recommendation

Post on 20-Jul-2016




1 download

Embed Size (px)


<ul><li><p>ACADEMIC EMERGENCY MEDICINE November 1998. Volume 5. Number 11 1101 </p><p>A Comparison of Standardized and Narrative Letters of Recommendation </p><p>DANIEL V. GIRZADAS JR., MD, ROBERT C . HARWOOD, MD, MPH, JOSEPH DEARIE, MD, SHAYLA GARRETT, MD </p><p>Abstract. Objective: To compare the Council of Emergency Medicine Residency Directors (CORDS) standardized letters of recommendation (SLORs) with traditional narrative letters of recommendation (NLORs) with regard to interrater reliability, consis- tency, and time of interpretation. Methods: In part I of the study, four members of the residency selection committee each evaluated the same 20 SLORs and 20 NLORs from which all identifying characteristics had been deleted. Using Likert-type scales of the global assessment, each letter was assigned a nu- meric value from 1 to 7. The interrater reliability was calculated for both types of letters using the Kendall coefficient of concordance. Average time to interpre- tation of the letters was also determined. In part 11, using the same numeric values as in part I, 207 sin- gle-author SLOFUNLOR pairs were evaluated to de- termine whether the global assessment of the SLOR was consistent with that of its partner NLOR. Inter- pretation of the NLOR was performed blinded to the </p><p>SLOR. Statistical analysis was calculated using Spearman correlation coefficients. Results: In part I of the study, the interrater reliability of the SLOR was 0.97, as compared with 0.78 for the NLOR. The average time to interpret the global assessment of the SLOR was 16 seconds, vs 90 seconds for the NLOR. In part I1 of the study, of the 207 SLOR/NLOR pairs, 112 (54%) were assigned the same numeric value, 80 (39%) differed by one, 13 (6%) differed by two, and two (1%) differed by three, for an overall correlation of 0.58. Conclusions: Compared with NLORs, the CORD SLOR offers better interrater reliability with less interpretation time. Single-author SLORMLOR pairs submitted for a single applicant do not correlate well. Residency selection committees must decide whether the added work of interpreting NLORs is beneficial. Key words: letter of recommendation; postgraduate education; emergency medicine; resi- dency; selection. ACADEMIC EMERGENCY MEDI- CINE 1998; 5:1101-1104 </p><p>RADITIONAL narrative letters of recommen- T dation (NLORs) are a factor of the resident selection process considered to be more influen- tial than U.S. Medical Licensing Examination (USMLE) scores. Along with transcripts and the deans letter, they are a n important pre-interview source of information about a n applicants inter- personal and clinical skills.2 Accurate interpreta- tion of NLORs requires time and a significant amount of experience, and even experienced inter- preters find the task d i f f i ~ u l t . ~ Frequently, impor- tant information is missing or worded in a manner that is subject to a range of i n t e r p r e t a t i ~ n . ~ </p><p>With the aim of making data extraction more precise and efficient, the Council of Emergency Medicine Residency Directors (CORD) has devel- oped a standardized letter of recommendation (SLOR). A SLOR would be expected to require less time and experience to interpret than a NLOR. It </p><p>From the Department of Emergency Medicine, Christ Hospital and Medical Center, Oak Lawn, IL (DVG, RCH, JD, SG). Received December 26, 1997; revision received June 11, 1998; accepted June 25, 1998. Address for correspondence and reprints: Daniel V. Girzadas Jr., MD. Department of Emergency Medicine, Christ Hospital and Medical Center, 4440 West 95th Street, Oak Lawn, IL 60453. </p><p>would ensure tha t information considered impor- tant to residency selection committees was not omitted. The experience of the previous application cycle seems to bear this out. A separate problem has developed, however. Frequently an author of a letter of recommendation (LOR) for a single appli- cant submits both a SLOR and a NLOR. Both let- ters are usually interpreted because one cannot be certain the same information is conveyed in both formats. This increases the workload of interpret- ing recommendations. If it can be demonstrated tha t the two types of recommendations convey equivalent information, the more time-consuming NLOR would be unnecessary. This would decrease the workload of resident selection. The first objec- tive of our study was to determine whether the SLOR conveys information equivalent to tha t of the NLOR. We also measured the interrater reli- ability of both the SLOR and the NLOR. Finally, we determined the time required to make a global assessment of both types of letters. </p><p>METHODS </p><p>Study Design. This was a retrospective review of LORs received as par t of the standard application </p></li><li><p>1102 RECOMMENDATION LETTERS Cirzadas et al. STANDARDIZED AND NARRATIVE LETTERS </p><p>TABLE 1. Narrative Letter of Recommendation (NLOR) recommendations ranging from poor to outstand- ing. We believed a random selection would have provided mostly letters in the 5-6 range, since </p><p>Classification System </p><p>Score Classification these were most common.) All identifying charac- teristics were deleted from each letter. The NLORs were not Paired with the SLORs; the raters were given a set of 20 NLORs and a different set of 20 </p><p>Includes glowing statements such as is one of the finest medical students of the year, is one of the best medical students I have ever worked with, richly deserves the hon- ors awarded in the rotation, or receives my highest rec- - - ommendation. </p><p>May include some honors grades, top 15-20%, near honors. Functions as an intern. </p><p>Contains the obligatory good fund of knowledge, punc- tual, hardworking, progressed well, should be a n ex- cellent candidate for postgraduate training, along with some superlatives. </p><p>Contains mildly complimentary but noncommittal language. Pleasantly describes a n average student and tries to put a good spin on the description. </p><p>May be completely neutral as if the writer has never met the student, or have some subtle descriptions of the students averageness or contains slightly negative comments. </p><p>Contains troublesome or negative comments with little or no balancing superlatives. Almost guarantees no interview. </p><p>I s hard to come by as most students do not ask someone who dislikes them or who has been disappointed in their perfor- mance to write them a letter of recommendation. All by itself guarantees no interview. </p><p>process between September and December 1996. A LOR could be submitted by a physician from any specialty. Letters reviewed included applicants who were rejected, interviewed, or ranked. Be- cause of the retrospective nature of this project, it was considered exempt from institutional review board review. </p><p>Studg Protocol. In part I of our study, we estab- lished seven-point Likert-type scales for the NLOR and the SLOR.5 For the NLOR, statements were classified and were assigned a numeric value ac- cording to an unpublished classification system de- veloped by one of the investigators (RCH), and used in the residency selection process of our de- partment (Table 1). The SLOR was also assigned a numeric value of 1-7 (Table 2). If there were inconsistencies, the letter was assigned a numeric value according to the most positive phrase. </p><p>To establish our seven-point numeric system as stable or constant, we determined its interrater reliability. Four raters evaluated the same 20 NLORs and 20 SLORs. Two raters were very experienced and two raters were inexperienced evaluators of LORs. The letters were selected nonrandomly to encompass global assessments ranging from most positive to negative. (In part I, we chose letters that would provide a spectrum of </p><p>- SLORs. The raters were asked to assess one entire set prior to assessing the remaining set. They were asked to rank each letter according to the estab- lished seven-point Likert-type scale. </p><p>In part I1 of our study, we examined 207 SLOW NLOR pairs. Virtually all paired letters that were submitted to our residency program in this appli- cation cycle were included. Each pair was written by a single author for a single applicant. The au- thor could be from any specialty. Each NLOW SLOR pair was interpreted by one of the same four raters as in part I using the same two ranking sys- tems described above. Blinded to the correspond- ing SLOR, each NLOR was interpreted first and assigned a numeric value. Immediately after, each SLOR was interpreted and assigned a numeric value. </p><p>Data Analysis. Interrater reliability among the four raters for both the NLOR and the SLOR was calculated using the Kendall coefficient of concor- dance. Time of interpretation was determined by timing one experienced rater and one junior rater for a total of 80 letters. </p><p>The numeric assignment of the SLOFUNLOR pair was correlated using the Spearman rank-or- der correlation coefficient. </p><p>RESULTS </p><p>In part I of our study, we determined an interrater reliability of the SLOR of 0.97. The interrater re- liability of the NLOR was 0.78. The average time required to interpret a SLOR was 16 seconds, com- pared with 90 seconds for the NLOR. (This average time represents the sum of the time it took for a n experienced rater and an inexperienced rater to in- terpret each packet of 20 letters, divided by 40 to- tal evaluations. We did not measure the time i t took to interpret each letter). </p><p>In part I1 of our study, 112 (54%) of the 207 SLOFUNLOR pairs were assigned the same nu- meric value. Eighty pairs (39%) differed by one point on the scale, 13 (6%) differed by two, and two (1%) differed by three. The overall correlation was 0.58. </p><p>DISCUSSION </p><p>Accurate interpretation of LORs is essential, since decisions based on these letters can profoundly af- </p></li><li><p>ACADEMIC EMERGENCY MEDICINE November 1998. Volume 5. Number 11 1103 </p><p>fect a residents future. Evaluative processes must be developed to minimize any error in classifica- tion. Ideally, a reliability of more than 0.95 should be achieved.6 Part I of our study showed that the interrater reliability of the SLOR is better than that of the NLOR. We believe that the method of evaluating NLORs developed by Harwood is straightforward. Yet, despite having used i t in our interpretation of every NLOR over the last three residency application cycles, we still found tha t subjectivity played a significant role in final deci- sion making. Interpretation of the SLOR, however, was strictly algorithmic. This left little room for subjectivity and improved the reliability between raters. </p><p>In our study, evaluation of a n applicants LORs was performed by physicians with a range of ex- perience in letter interpretation. Two of the phy- sicians were senior members of our residency se- lection committee having a cumulative experience of interpreting tens of thousands of LORs. The other two physicians were resident members of the selection committee who cumulatively had inter- preted fewer than 500 LORs. This diversity would be expected to decrease interrater reliability. How- ever, a n analysis of our data found that the inter- rater reliability of both the SLOR and the NLOR was not affected by level of experience. The SLOR had better interrater reliability than did the NLOR regardless of the interpreters experience. As such i t speaks to the strength of the SLOR. It offers a high level of interrater reliability for both experienced and novice interpreters of LORs. It al- lows residents and junior faculty members to play a greater role in the evaluation of residency appli- cations. </p><p>There currently is no reference criterion stan- dard for the interpretation of LORS.~ This is be- cause any assessment of clinical performance is in- herently subjective. Previous studies have shown that NLORs are not valid when compared with the criterion standard of actual resident performance, and that they frequently do not contain the nec- essary information to adequately judge appli- cants.8 Schaider e t al.9 recently showed that when using actual resident performance as the criterion standard, there was no difference in the predictive value between a preprinted questionnaire and a NLOR if reviewed retrospectively. That study rec- ommended using only the SLOR to evaluate appli- cants. If i t truly is crucial to have a high reliability for an evaluation tha t determines an applicants future, the SLOR is superior to the NLOR. Addi- tionally, i t forces the writer of the recommendation to describe for residency selection committees spe- cific character traits of interest that are frequently not addressed or are worded vaguely in NLORs. </p><p>Using our algorithm, the time required to in- </p><p>TABLE 2. Standardized Letter of Recommendation (SLOR) Classification System </p><p>Score Classification </p><p>7 Guaranteed match 6 Outstandinghery likely to match 5 Excellent 4 Very good 3 Good 2 Would not rank 1 Would not rank, plus negative comment </p><p>terpret the SLOR is much less than that required to interpret the NLOR. This is a n added benefit of using the SLOR, and can reduce the time neces- sary to evaluate and select. </p><p>Part I1 of our study suggests that there is a moderately low correlation between the SLOR and the NLOR written by the same author. Conse- quently, if writers of recommendations continue to submit both formats, residency selection commit- tees must either evaluate both the SLOR and the NLOR and increase their workload, or choose to read only one format. Our results combined with those of Schaider et al.9 suggest that , if there is no significant difference in the predictive value of the two formats, one should choose the more reliable and faster format, the SLOR. </p><p>LIMITATIONS AND FUTURE QUESTIONS </p><p>We chose not to pair the SLORs and the NLORs in part I of our study but did pair them in part 11. In part I, our main objective was to evaluate the interrater reliability of both formats of letters. We did not directly compare the two types of letters in this part of the study. Thus we believed we could allow the raters to focus on a single format over 20 letters before having to concentrate on the other format. In part 11, we directly compared one format with the other. A residency selection committee would normally interpret letters written by a sin- gle author together as a pair. We therefore thought it was relevant to pair the letters for this aspect of the study. </p><p>We focused on the global assessment of letters because we believed t h a t is what most interpreters of LORs try to accomplish.2,10 The SLOR consis- tently provided information regarding a n appli- cants commitment to emergency medicine (EM), work ethic, interpersonal skills, and ability to develop a cohesive treatment plan. Frequently NLORs lacked information about these separate characteristics. We therefore could not compare some specific traits between the two letter types. </p><p>The correlation of the single author SLOW NLOR pairs would be improved to 0.93 if we al- lowed for a variance of one point on the Likert </p></li><li><p>1104 RECOMMENDATION LETTERS Gir...</p></li></ul>


View more >