a comparison of standardized and narrative letters of recommendation

4
ACADEMIC EMERGENCY MEDICINE November 1998. Volume 5. Number 11 1101 A Comparison of Standardized and Narrative Letters of Recommendation DANIEL V. GIRZADAS JR., MD, ROBERT C. HARWOOD, MD, MPH, JOSEPH DEARIE, MD, SHAYLA GARRETT,MD Abstract. Objective: To compare the Council of Emergency Medicine Residency Directors’ (CORDS) standardized letters of recommendation (SLORs) with traditional narrative letters of recommendation (NLORs) with regard to interrater reliability, consis- tency, and time of interpretation. Methods: In part I of the study, four members of the residency selection committee each evaluated the same 20 SLORs and 20 NLORs from which all identifying characteristics had been deleted. Using Likert-type scales of the global assessment, each letter was assigned a nu- meric value from 1 to 7. The interrater reliability was calculated for both types of letters using the Kendall coefficient of concordance. Average time to interpre- tation of the letters was also determined. In part 11, using the same numeric values as in part I, 207 sin- gle-author SLOFUNLOR pairs were evaluated to de- termine whether the global assessment of the SLOR was consistent with that of its partner NLOR. Inter- pretation of the NLOR was performed blinded to the SLOR. Statistical analysis was calculated using Spearman correlation coefficients. Results: In part I of the study, the interrater reliability of the SLOR was 0.97, as compared with 0.78 for the NLOR. The average time to interpret the global assessment of the SLOR was 16 seconds, vs 90 seconds for the NLOR. In part I1 of the study, of the 207 SLOR/NLOR pairs, 112 (54%) were assigned the same numeric value, 80 (39%) differed by one, 13 (6%) differed by two, and two (1%) differed by three, for an overall correlation of 0.58. Conclusions: Compared with NLORs, the CORD SLOR offers better interrater reliability with less interpretation time. Single-author SLORMLOR pairs submitted for a single applicant do not correlate well. Residency selection committees must decide whether the added work of interpreting NLORs is beneficial. Key words: letter of recommendation; postgraduate education; emergency medicine; resi- dency; selection. ACADEMIC EMERGENCY MEDI- CINE 1998; 5:1101-1104 RADITIONAL narrative letters of recommen- T dation (NLORs) are a factor of the resident selection process considered to be more influen- tial than U.S. Medical Licensing Examination (USMLE) scores.’ Along with transcripts and the dean’s letter, they are an important pre-interview source of information about an applicant’s inter- personal and clinical skills.2 Accurate interpreta- tion of NLORs requires time and a significant amount of experience, and even experienced inter- preters find the task diffi~ult.~ Frequently, impor- tant information is missing or worded in a manner that is subject to a range of interpretati~n.~ With the aim of making data extraction more precise and efficient, the Council of Emergency Medicine Residency Directors (CORD) has devel- oped a standardized letter of recommendation (SLOR). A SLOR would be expected to require less time and experience to interpret than a NLOR. It From the Department of Emergency Medicine, Christ Hospital and Medical Center, Oak Lawn, IL (DVG, RCH, JD, SG). Received December 26, 1997; revision received June 11, 1998; accepted June 25, 1998. Address for correspondence and reprints: Daniel V. Girzadas Jr., MD. Department of Emergency Medicine, Christ Hospital and Medical Center, 4440 West 95th Street, Oak Lawn, IL 60453. would ensure that information considered impor- tant to residency selection committees was not omitted. The experience of the previous application cycle seems to bear this out. A separate problem has developed, however. Frequently an author of a letter of recommendation (LOR) for a single appli- cant submits both a SLOR and a NLOR. Both let- ters are usually interpreted because one cannot be certain the same information is conveyed in both formats. This increases the workload of interpret- ing recommendations. If it can be demonstrated that the two types of recommendations convey equivalent information, the more time-consuming NLOR would be unnecessary. This would decrease the workload of resident selection. The first objec- tive of our study was to determine whether the SLOR conveys information equivalent to that of the NLOR. We also measured the interrater reli- ability of both the SLOR and the NLOR. Finally, we determined the time required to make a global assessment of both types of letters. METHODS Study Design. This was a retrospective review of LORs received as part of the standard application

Upload: daniel-v-girzadas-jr

Post on 20-Jul-2016

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A Comparison of Standardized and Narrative Letters of Recommendation

ACADEMIC EMERGENCY MEDICINE November 1998. Volume 5. Number 11 1101

A Comparison of Standardized and Narrative Letters of Recommendation

DANIEL V. GIRZADAS JR., MD, ROBERT C . HARWOOD, MD, MPH, JOSEPH DEARIE, MD, SHAYLA GARRETT, MD

Abstract. Objective: To compare the Council of Emergency Medicine Residency Directors’ (CORDS) standardized letters of recommendation (SLORs) with traditional narrative letters of recommendation (NLORs) with regard to interrater reliability, consis- tency, and time of interpretation. Methods: In part I of the study, four members of the residency selection committee each evaluated the same 20 SLORs and 20 NLORs from which all identifying characteristics had been deleted. Using Likert-type scales of the global assessment, each letter was assigned a nu- meric value from 1 to 7. The interrater reliability was calculated for both types of letters using the Kendall coefficient of concordance. Average time to interpre- tation of the letters was also determined. In part 11, using the same numeric values as in part I, 207 sin- gle-author SLOFUNLOR pairs were evaluated to de- termine whether the global assessment of the SLOR was consistent with that of its partner NLOR. Inter- pretation of the NLOR was performed blinded to the

SLOR. Statistical analysis was calculated using Spearman correlation coefficients. Results: In part I of the study, the interrater reliability of the SLOR was 0.97, as compared with 0.78 for the NLOR. The average time to interpret the global assessment of the SLOR was 16 seconds, vs 90 seconds for the NLOR. In part I1 of the study, of the 207 SLOR/NLOR pairs, 112 (54%) were assigned the same numeric value, 80 (39%) differed by one, 13 (6%) differed by two, and two (1%) differed by three, for an overall correlation of 0.58. Conclusions: Compared with NLORs, the CORD SLOR offers better interrater reliability with less interpretation time. Single-author SLORMLOR pairs submitted for a single applicant do not correlate well. Residency selection committees must decide whether the added work of interpreting NLORs is beneficial. Key words: letter of recommendation; postgraduate education; emergency medicine; resi- dency; selection. ACADEMIC EMERGENCY MEDI- CINE 1998; 5:1101-1104

RADITIONAL narrative letters of recommen- T dation (NLORs) are a factor of the resident selection process considered to be more influen- tial than U.S. Medical Licensing Examination (USMLE) scores.’ Along with transcripts and the dean’s letter, they are a n important pre-interview source of information about a n applicant’s inter- personal and clinical skills.2 Accurate interpreta- tion of NLORs requires time and a significant amount of experience, and even experienced inter- preters find the task d i f f i ~ u l t . ~ Frequently, impor- tant information is missing or worded in a manner that is subject to a range of i n t e r p r e t a t i ~ n . ~

With the aim of making data extraction more precise and efficient, the Council of Emergency Medicine Residency Directors (CORD) has devel- oped a standardized letter of recommendation (SLOR). A SLOR would be expected to require less time and experience to interpret than a NLOR. It

From the Department of Emergency Medicine, Christ Hospital and Medical Center, Oak Lawn, IL (DVG, RCH, JD, SG). Received December 26, 1997; revision received June 11, 1998; accepted June 25, 1998. Address for correspondence and reprints: Daniel V. Girzadas Jr., MD. Department of Emergency Medicine, Christ Hospital and Medical Center, 4440 West 95th Street, Oak Lawn, IL 60453.

would ensure tha t information considered impor- tant to residency selection committees was not omitted. The experience of the previous application cycle seems to bear this out. A separate problem has developed, however. Frequently an author of a letter of recommendation (LOR) for a single appli- cant submits both a SLOR and a NLOR. Both let- ters are usually interpreted because one cannot be certain the same information is conveyed in both formats. This increases the workload of interpret- ing recommendations. If it can be demonstrated tha t the two types of recommendations convey equivalent information, the more time-consuming NLOR would be unnecessary. This would decrease the workload of resident selection. The first objec- tive of our study was to determine whether the SLOR conveys information equivalent to tha t of the NLOR. We also measured the interrater reli- ability of both the SLOR and the NLOR. Finally, we determined the time required to make a global assessment of both types of letters.

METHODS

Study Design. This was a retrospective review of LORs received as par t of the standard application

Page 2: A Comparison of Standardized and Narrative Letters of Recommendation

1102 RECOMMENDATION LETTERS Cirzadas et al. STANDARDIZED AND NARRATIVE LETTERS

TABLE 1. Narrative Letter of Recommendation (NLOR) recommendations ranging from poor to outstand- ing. We believed a random selection would have provided mostly letters in the 5-6 range, since

Classification System

Score Classification these were most common.) All identifying charac- teristics were deleted from each letter. The NLORs were not Paired with the SLORs; the raters were given a set of 20 NLORs and a different set of 20

Includes glowing statements such as “is one of the finest medical students of the year,” “is one of the best medical students I have ever worked with,” “richly deserves the hon- ors awarded in the rotation,” or “receives my highest rec- - - ommendation.”

May include some honors grades, top 15-20%, near honors. “Functions as an intern.”

Contains the obligatory “good fund of knowledge,” “punc- tual,” “hardworking,” “progressed well,” “should be a n ex- cellent candidate for postgraduate training,” along with some superlatives.

Contains mildly complimentary but noncommittal language. Pleasantly describes a n average student and tries to put a good spin on the description.

May be completely neutral as if the writer has never met the student, or have some subtle descriptions of the student’s averageness or contains slightly negative comments.

Contains troublesome or negative comments with little or no balancing superlatives. Almost guarantees “no interview.”

I s hard to come by as most students do not ask someone who dislikes them or who has been disappointed in their perfor- mance to write them a letter of recommendation. All by itself guarantees “no interview.”

process between September and December 1996. A LOR could be submitted by a physician from any specialty. Letters reviewed included applicants who were rejected, interviewed, or ranked. Be- cause of the retrospective nature of this project, it was considered exempt from institutional review board review.

Studg Protocol. In part I of our study, we estab- lished seven-point Likert-type scales for the NLOR and the SLOR.5 For the NLOR, statements were classified and were assigned a numeric value ac- cording to an unpublished classification system de- veloped by one of the investigators (RCH), and used in the residency selection process of our de- partment (Table 1). The SLOR was also assigned a numeric value of 1-7 (Table 2). If there were inconsistencies, the letter was assigned a numeric value according to the most positive phrase.

To establish our seven-point numeric system as stable or constant, we determined its interrater reliability. Four raters evaluated the same 20 NLORs and 20 SLORs. Two raters were very experienced and two raters were inexperienced evaluators of LORs. The letters were selected nonrandomly to encompass global assessments ranging from most positive to negative. (In part I, we chose letters that would provide a spectrum of

- SLORs. The raters were asked to assess one entire set prior to assessing the remaining set. They were asked to rank each letter according to the estab- lished seven-point Likert-type scale.

In part I1 of our study, we examined 207 SLOW NLOR pairs. Virtually all paired letters that were submitted to our residency program in this appli- cation cycle were included. Each pair was written by a single author for a single applicant. The au- thor could be from any specialty. Each NLOW SLOR pair was interpreted by one of the same four raters as in part I using the same two ranking sys- tems described above. Blinded to the correspond- ing SLOR, each NLOR was interpreted first and assigned a numeric value. Immediately after, each SLOR was interpreted and assigned a numeric value.

Data Analysis. Interrater reliability among the four raters for both the NLOR and the SLOR was calculated using the Kendall coefficient of concor- dance. Time of interpretation was determined by timing one experienced rater and one junior rater for a total of 80 letters.

The numeric assignment of the SLOFUNLOR pair was correlated using the Spearman rank-or- der correlation coefficient.

RESULTS

In part I of our study, we determined an interrater reliability of the SLOR of 0.97. The interrater re- liability of the NLOR was 0.78. The average time required to interpret a SLOR was 16 seconds, com- pared with 90 seconds for the NLOR. (This average time represents the sum of the time it took for a n experienced rater and an inexperienced rater to in- terpret each packet of 20 letters, divided by 40 to- tal evaluations. We did not measure the time i t took to interpret each letter).

In part I1 of our study, 112 (54%) of the 207 SLOFUNLOR pairs were assigned the same nu- meric value. Eighty pairs (39%) differed by one point on the scale, 13 (6%) differed by two, and two (1%) differed by three. The overall correlation was 0.58.

DISCUSSION

Accurate interpretation of LORs is essential, since decisions based on these letters can profoundly af-

Page 3: A Comparison of Standardized and Narrative Letters of Recommendation

ACADEMIC EMERGENCY MEDICINE November 1998. Volume 5. Number 11 1103

fect a resident’s future. Evaluative processes must be developed to minimize any error in classifica- tion. Ideally, a reliability of more than 0.95 should be achieved.6 Part I of our study showed that the interrater reliability of the SLOR is better than that of the NLOR. We believe that the method of evaluating NLORs developed by Harwood is straightforward. Yet, despite having used i t in our interpretation of every NLOR over the last three residency application cycles, we still found tha t subjectivity played a significant role in final deci- sion making. Interpretation of the SLOR, however, was strictly algorithmic. This left little room for subjectivity and improved the reliability between raters.

In our study, evaluation of a n applicant’s LORs was performed by physicians with a range of ex- perience in letter interpretation. Two of the phy- sicians were senior members of our residency se- lection committee having a cumulative experience of interpreting tens of thousands of LORs. The other two physicians were resident members of the selection committee who cumulatively had inter- preted fewer than 500 LORs. This diversity would be expected to decrease interrater reliability. How- ever, a n analysis of our data found that the inter- rater reliability of both the SLOR and the NLOR was not affected by level of experience. The SLOR had better interrater reliability than did the NLOR regardless of the interpreter’s experience. As such i t speaks to the strength of the SLOR. It offers a high level of interrater reliability for both experienced and novice interpreters of LORs. It al- lows residents and junior faculty members to play a greater role in the evaluation of residency appli- cations.

There currently is no reference criterion stan- dard for the interpretation of LORS.~ This is be- cause any assessment of clinical performance is in- herently subjective. Previous studies have shown that NLORs are not valid when compared with the criterion standard of actual resident performance, and that they frequently do not contain the nec- essary information to adequately judge appli- cants.8 Schaider e t al.9 recently showed that when using actual resident performance as the criterion standard, there was no difference in the predictive value between a preprinted questionnaire and a NLOR if reviewed retrospectively. That study rec- ommended using only the SLOR to evaluate appli- cants. If i t truly is crucial to have a high reliability for an evaluation tha t determines an applicant’s future, the SLOR is superior to the NLOR. Addi- tionally, i t forces the writer of the recommendation to describe for residency selection committees spe- cific character traits of interest that are frequently not addressed or are worded vaguely in NLORs.

Using our algorithm, the time required to in-

TABLE 2. Standardized Letter of Recommendation (SLOR) Classification System

Score Classification

7 Guaranteed match 6 Outstandinghery likely to match 5 Excellent 4 Very good 3 Good 2 Would not rank 1 Would not rank, plus negative comment

terpret the SLOR is much less than that required to interpret the NLOR. This is a n added benefit of using the SLOR, and can reduce the time neces- sary to evaluate and select.

Part I1 of our study suggests that there is a moderately low correlation between the SLOR and the NLOR written by the same author. Conse- quently, if writers of recommendations continue to submit both formats, residency selection commit- tees must either evaluate both the SLOR and the NLOR and increase their workload, or choose to read only one format. Our results combined with those of Schaider et al.9 suggest that , if there is no significant difference in the predictive value of the two formats, one should choose the more reliable and faster format, the SLOR.

LIMITATIONS AND FUTURE QUESTIONS

We chose not to pair the SLORs and the NLORs in part I of our study but did pair them in part 11. In part I, our main objective was to evaluate the interrater reliability of both formats of letters. We did not directly compare the two types of letters in this part of the study. Thus we believed we could allow the raters to focus on a single format over 20 letters before having to concentrate on the other format. In part 11, we directly compared one format with the other. A residency selection committee would normally interpret letters written by a sin- gle author together as a pair. We therefore thought it was relevant to pair the letters for this aspect of the study.

We focused on the global assessment of letters because we believed t h a t is what most interpreters of LORs try to accomplish.2,10 The SLOR consis- tently provided information regarding a n appli- cant’s commitment to emergency medicine (EM), work ethic, interpersonal skills, and ability to develop a cohesive treatment plan. Frequently NLORs lacked information about these separate characteristics. We therefore could not compare some specific traits between the two letter types.

The correlation of the single author SLOW NLOR pairs would be improved to 0.93 if we al- lowed for a variance of one point on the Likert

Page 4: A Comparison of Standardized and Narrative Letters of Recommendation

1104 RECOMMENDATION LETTERS Girzadae et al. STANDARDIZED AND NARRAT~VE LETTERS

scale. However, we were interested to know whether the SLOR and NLOR conveyed equivalent recommendations. Thus we thought it was impor- tant to keep variance between both letter formats to a minimum in our interpretation of their cor- relation.

Both types of letters suffered from inconsisten- cies in classification in the scales. Particularly in NLORs, authors used phrases that corresponded to different numeric rankings (i.e., “good fund of knowledge” and “my highest recommendation”). Similar but less common inconsistencies also oc- curred in the SLORs (i.e., “excellent” and “guar- anteed match”). To improve reliability, i t would be useful for the CORD to develop a set of guidelines to delineate how to write different levels of the SLOR. This would make interpreting the SLOR even more definite because we would all follow the same standard.

Interpretation of a NLOR was always done prior to evaluating the accompanying SLOR. We believed this would avoid bias because our stan- dard process for ranking NLORs allows for some subjective interpretation. On the other hand, in- terpretation of the SLOR is strictly algorithmic. I t does not allow for subjectivity and thus it would not be expected to be biased by the NLOR.

The issue of LORs written by non-EM authors is problematic. Our study was undertaken during a year when the CORD SLOR was used by both EM and non-EM authors. Recently the CORD has asked that only EM faculty use the SLOR. It would seem that the SLOR could be adapted with mini- mal changes to suit the needs of non-EM authors. I t is likely that a non-EM SLOR could provide the same benefits to residency selection committees as does the CORD SLOR.

Many questions remain about this comparison. Why do SLORs and NLORs not correspond closely? Is it due to differences in content or to the subjec- tive and inexact nature of the narrative format? Only emergency physicians are using the CORD SLOR this year; will this lead to more consistency between the two formats? To what extent are res- idency selection committees still interpreting NLORs when a SLOR is also sent by the same au- thor? It would be helpful to have a prospective

study with predetermined outcome measures eval- uating the performance of residents who had LORs that did not correlate. This would provide a stronger measure of the predictive power of a spe- cific letter format. Our study suggests that the SLOR had better performance characteristics, but its predictive ability compared with the NLOR has not yet been evaluated.

CONCLUSION

Compared with NLORs, the CORD SLOR offers better interrater reliability with less interpreta- tion time. Single-author SLOFUNLOR pairs sub- mitted for a single applicant do not correlate well. Residency selection committees must decide whether the added work of interpreting NLORs is beneficial.

Thanks to Nancy Cipparrone of Advocate Health Care Re- search and Education Institute for her statistical support and help preparing the manuscript. Thanks also to Joyce Fedeczko, MALS, and Library Staff of Advocate Health Sciences Library Network for research assistance.

References

1. Frankville D, Benumof MI. Relative importance of the fac- tors used to select residents: a survey. Anesthesiology. 1991; 75:A876. 2. Baker DJ, Bailey MK, Brahen NH, Conroy JM, Dorman HB, Haynes GR. Selection of anesthesiology residents. Acad Med. 1993; 68:161-3. 3. Garmel GM. Letters of recommendation: what does good really mean? [letter]. Acad Emerg Med. 1997; 49333-4. 4. O’Halloran CM, Altmaier EM, Smith’WL, Franken EA. Evaluation of resident applicants by letters of recommenda- tion: a comparison of traditional and behavior based formats. Investig Radiol. 1993; 28:274-7. 6. Likert R. Technique for the measurement of attitudes. Arch Psychiatry. 1932; 14O:l-55 6. Cozby PC. Methods in Behavioral Research, 6th ed. Moun- tain View, CA: Mayfield Publishing, 1997. 7. Karras DJ. Statistical methodology: 11. Reliability and va- lidity assessment in study design, Part B. Acad Emerg Med.

8. Leichner P. Eusebio-Torres E, Harper D. The validity of ref- erence letters in predicting resident performance. J Med Educ.

9. Schaider J J , Rydman RJ, Greene CS. Predictive value of letters of recommendation vs questionnaires for emergency medicine resident performance. Acad Emerg Med. 1997; 4:

10. Greenburg AG, Doyle J , McClure DK. Letters of recom- mendation for surgical residencies: what they say and what they mean. J Surg Res. 1994; 56:192-8.

1997; 4:144-7.

1981; 56:1019-21.

801-5.