the effects of traited and situational impression management on a

21
Psychology Science, Volume 48, 2006 (3), p. 247-267 The effects of traited and situational impression management on a personality test: an empirical analysis MICHAEL S. HENRY 1 & NAMBURY S. RAJU Abstract Studies examining impression management (IM) in self-report measures typically assume that impression management is either a 1) trait or 2) situational variable, which has led to often conflicting results (Stark, Chernyshenko, Chan, Lee and Drasgow. 2001). This study examined the item-level and scale-level responses on six empirically-derived facets of con- scientiousness from the California Psychological Inventory (CPI) between high and low IM groups. Subjects (N = 6,220) were participants in a management assessment conducted by an external consulting firm. Subjects participated in the assessment as part of either 1) a selec- tion or promotional process, or 2) a feedback and development process, and two specific occupational groups (sales/marketing and accounting/finance) were examined. Using the IRT-based DFIT framework (Raju, Van der Linden & Fleer, 1995), the item-level and scale- level differences were examined for the situational IM and traited IM approaches. The re- sults indicated that relatively little DIF/DTF was present and that the differences between the two approaches to examining IM may not be as great as previously suggested. Key words: social desirability, impression management, personality test 1 Michael S. Henry, Illinois Institute of Technology and Standard & Associates, Inc.; Nambury S. Raju, Illinois Institute of Technology. An earlier version of this article was presented at the 21st Annual Conference of the Society for Industrial and Organizational Psychology, Dallas, Texas, May 2006. The authors are grateful to Bob Lewis, Roxanne Laczo and Paul Bly of Personnel Decisions International for providing such an invaluable data set for use in this study. We also thank David Blitz for comments on a previous draft of this paper. Correspondence regarding this article should be addressed to Michael S. Henry, Stanard & Associates, Inc., 309 W. Washington, Suite 1000, Chicago, IL, 60606.

Upload: vunhu

Post on 14-Feb-2017

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The effects of traited and situational impression management on a

Psychology Science, Volume 48, 2006 (3), p. 247-267

The effects of traited and situational impression management on a personality test: an empirical analysis

MICHAEL S. HENRY1 & NAMBURY S. RAJU

Abstract Studies examining impression management (IM) in self-report measures typically assume

that impression management is either a 1) trait or 2) situational variable, which has led to often conflicting results (Stark, Chernyshenko, Chan, Lee and Drasgow. 2001). This study examined the item-level and scale-level responses on six empirically-derived facets of con-scientiousness from the California Psychological Inventory (CPI) between high and low IM groups. Subjects (N = 6,220) were participants in a management assessment conducted by an external consulting firm. Subjects participated in the assessment as part of either 1) a selec-tion or promotional process, or 2) a feedback and development process, and two specific occupational groups (sales/marketing and accounting/finance) were examined. Using the IRT-based DFIT framework (Raju, Van der Linden & Fleer, 1995), the item-level and scale-level differences were examined for the situational IM and traited IM approaches. The re-sults indicated that relatively little DIF/DTF was present and that the differences between the two approaches to examining IM may not be as great as previously suggested.

Key words: social desirability, impression management, personality test

1 Michael S. Henry, Illinois Institute of Technology and Standard & Associates, Inc.; Nambury S. Raju,

Illinois Institute of Technology. An earlier version of this article was presented at the 21st Annual Conference of the Society for Industrial

and Organizational Psychology, Dallas, Texas, May 2006. The authors are grateful to Bob Lewis, Roxanne Laczo and Paul Bly of Personnel Decisions International

for providing such an invaluable data set for use in this study. We also thank David Blitz for comments on a previous draft of this paper.

Correspondence regarding this article should be addressed to Michael S. Henry, Stanard & Associates, Inc., 309 W. Washington, Suite 1000, Chicago, IL, 60606.

Page 2: The effects of traited and situational impression management on a

M.S. Henry & N.S. Raju 248

While research conducted over the past decade has encouraged applied psychologists re-garding the validity of personality tests in personnel selection settings (Barrick & Mount, 1991; Tett, Jackson, & Rothstein, 1991; Hogan, Hogan, & Roberts, 1996), lingering ques-tions remain regarding several issues. One such issue deals with the impact of social desir-ability, or impression management (IM), on personality test scores. Hogan (1991) has noted that item responses on personality measures reflect self-presentations not self-reports. Re-search in the field of industrial/organizational psychology has supported the notion that individuals can raise their scores on personality measures (e.g., Hough, Eaton, Dunnette, Kamp & McCloy, 1990).

As noted by Stark et al. (2001), studies which attempt to assess the impact of IM on a va-riety of psychometric properties of personality test scores typically assume that IM is either a 1) trait or 2) situational variable. This dichotomous view of the nature of IM has alternatively been labeled the “substance vs. style” debate (Smith & Ellingson, 2002). Many researchers have expressed their belief that IM reflects substantive personality characteristics (Cattell, Eber & Tatsuoka, 1970; Furnham, 1986; McCrae & Costa, 1983; Nicholson & Hogan, 1990). Studies which follow the trait, or substance, theory typically assume that scales de-signed to measure IM can do so in a reliable and valid manner. Subjects within a specific group (e.g., applicants, non-applicants, job incumbents, students) are split into IM and non-IM groups on the basis of their social desirability/IM scores. Analyses are then run sepa-rately to determine if raw scores, criterion-related or construct validity/factor structures differ between the two groups.

Alternatively, situational IM studies assume IM is a response style that is a function of the testing situation. Researchers adopting this view theorize IM as response distortion and not related to substantive personality variance (c.f., Christiansen, 1998; Douglas, McDaniel, & Snell, 1996; Ellingson, Sackett, & Hough, 1999; Frei, Griffith, McDaniel, Snell, & Doug-las, 1997; Griffith, 1997; Hauenstein, 1998; Snell & McDaniel, 1998). Situational studies typically involve either 1) directing experimental groups to respond honestly or to fake their scores or 2) comparing applicants (assumed to fake) and non-applicants (assumed to respond honestly).

Traited and situational impression management research Traited studies typically use scores on the IM scale to statistically determine if IM is

moderating test validity. Traited IM studies have typically found that IM does not adversely impact criterion-related validity (Ones and Viswesvaran, 1998; Hough et al., 1990) or con-struct validity (Ones and Viswesvaran, 1998)

Situational IM studies, on the other hand, often find that IM matters. Research comparing applicants to non-applicants frequently finds that validity is lower in the applicant group (Hough, 1998), and directed IM studies tend to find that the criterion-related validity of personality tests decreases or often completely drops away (Douglas et al., 1996).

With respect to construct validity, the results are less clear. Michaelis and Eysenck (1971), Montag and Levin (1994), Smith (1996), Collins and Gleaves (1998) and Smith, Hanges, and Dickson (2001) all found that IM does not alter factor structure. However, other researchers have found that the factor structure of personality measures is either altered or destroyed by IM. Comparing applicants and incumbents, both Schmit and Ryan (1993) and

Page 3: The effects of traited and situational impression management on a

Impression management 249

Cellar, Miller, Doverspike, and Klawsky (1996) found that when examining the factor struc-ture of Big Five personality tests in high IM groups (applicants), a six-factor solution fit best. In both cases, the sixth factor was composed primarily of socially desirable items, constitut-ing an ideal employee factor. When comparing those instructed to fake and those instructed to respond honestly, researchers have found that the factor structure of personality tests is destroyed, leaving only one, global, IM factor (Griffith, 1997; Douglas et al., 1996). Differ-ences in factor structure between job applicants and incumbents were also found by Micha-elis and Eysenck (1971).

In conclusion, various researchers have found conflicting results with respect to the im-pact of IM on the construct validity and factor structure and construct validity of personality tests. The results range from finding essentially no impact, to findings of either different factor structures or unidimensional factor structures dominated by IM. The variance in find-ings seems to be, in part, due to the different methods of identifying or measuring IM (e.g., applicants vs. incumbents, IM instructions, or using existing IM scales to separate respon-dents). While it is difficult to make conclusive statements regarding the literature on IM, it does appear that traited IM studies tend to show little or no impact of IM on construct or criterion-related validity, while situational IM studies frequently show significant effects on both.

Measurement equivalence Measurement equivalence refers to a situation in which people with identical scale or test

scores on the underlying/latent construct have the same expected raw score or true score at the item level, the subscale level, or both (Drasgow & Kanfer, 1985). Measurement equiva-lence on test or survey items is important for interpreting mean differences. Mean score differences between groups or populations, defined as impact, can be due to true differences on the construct being measured (e.g., job satisfaction, verbal reasoning, conscientiousness), or due to properties of the item/measure that are not part of the construct being measured. If measurement equivalence exists, then any mean score differences between groups can be interpreted as differences on the construct being measured. One popular technique for assess-ing measurement equivalence involves using item response theory (IRT).

Measurement equivalence via the DFIT framework Item response theory (IRT) describes the relationship between item responses and the

characteristic measured by a scale or test (Drasgow & Hulin, 1990). The characteristics measured by any given test or scale are termed latent traits and can include such diverse constructs as spatial ability, job satisfaction, or extroversion. In IRT terminology, the latent trait is referred to as theta (denoted θ). IRT assumes a nonlinear relationship between the underlying latent construct and the observed score at the scale/item level. The simplest form of IRT is used with respect to dichotomous (yes/no, true/false, correct/incorrect) responses. For more information, refer to Embertson and Reise (2000).

Measurement equivalence as a psychometric concept is closely linked to the concept of differential item functioning (DIF) in IRT (Raju & Ellis, 2003). While many procedures

Page 4: The effects of traited and situational impression management on a

M.S. Henry & N.S. Raju 250

have been proposed for assessing DIF, or measurement inequivalence, (cf., Lord, 1980; Raju, 1988, 1990; Thissen, Steinberg, & Wainer, 1988), Raju, van der Linden, and Fleer’s (1995) procedure known as differential functioning of items and tests (DFIT) has proven to be useful for assessing measurement equivalence for dichotomous and polytomous IRT models, as well as multidimensional IRT models (Oshima, Raju, & Flowers, 1997).

In the DFIT framework, the true score for each examinee on item i is calculated as if they were 1) a member of the focal group (tisF) and 2) a member of the reference group (tisR). True scores can alternatively be described as the expected proportion correct for an item. Each individual has two true scores: one as a member of the focal group and one as a member of the reference group. The two true scores will be equal whenever the item parameters for both groups are identical:

dis = tisF - tisR = 0 for all values of θ (1)

where dis is the difference in true scores for subject s on item i calculated as if he/she were a member of the focal group and as a member of the reference group. Thus, if true scores for an individual when calculated as if the individual were a member of the focal group and as a member of the reference group are equal, measurement equivalence exists on that item.

The DFIT framework provides two measures of differential item functioning, compensa-tory and non-compensatory DIF (CDIF and NCDIF), and one measure of differential test functioning (DTF). CDIF does not assume that all items besides the item under study are free from DIF, unlike most other measures of DIF. Thus, CDIF can account for correlated differ-ential functioning among test items. The DTF index is defined at the subscale or total test level, and reflects the average squared difference between focal and reference group total score-level true scores. Practically speaking, under DTF, if one item favors the reference group, and another item favors the focal group, the item-level DIF will cancel out when combined to calculate DTF.

The NCDIF index is defined at the item level, and reflects the average squared difference between the focal and reference group item-level true scores. Under NCDIF, it is assumed that all other items aside from the one being studied are free from DIF – an assumption that is made by many other measures of DIF. The distinction between CDIF and NCDIF is ex-tremely important. It is possible for a test to have non-significant DTF, while containing items which have significant DIF at the item level because DIF on these items cancel each other out.

A variety of tests are also available to determine if the differences between the two groups are statistically significant. According to Raju et al., DTF is present for dichoto-mously scored items when the chi-square statistic is statistically significant and the DTF index is greater than .006. A new procedure, developed by Oshima, Raju, and Nanda (in press) allows for the identification of cutoff scores for the NCDIF and DTF indices for each item. Procedures are available to assess measurement equivalence for both dichotomous and polytomous items.

Page 5: The effects of traited and situational impression management on a

Impression management 251

Measurement equivalence and impression management Employing a traited approach, Flanagan & Raju (1997) examined the measurement

equivalence of the 16-PF between high and average IM groups in a large group of job appli-cants (across organizations and jobs) via IRT. Raju et al.’s (1995) DFIT framework (specifi-cally NCDIF) was used to assess whether item parameters were the same across groups on scales of 16-PF between high and average IM groups based on the test’s IM scale. Flanagan & Raju (1997) found relatively few items showed DIF between high and average IM groups, concluding that there was measurement equivalence.

Employing a situational approach, Zickar and Robie (1999) used Samejima’s (1969) graded response model to assess measurement equivalence between military recruits in-structed to answer honestly, fake good, or fake good with coaching to three personality scales on the military’s ABLE personality test. Zickar and Robie found that although there were large differences in latent personality trait scores (θ) because of IM, a majority of items functioned similarly between groups. However, two of three scales did show DTF at the scale level, and on one of these scales (Non-delinquency), nearly half of the items demon-strated DIF.

Robie, Zickar and Schmit (2001) examined the measurement equivalence of a personality test between applicant and incumbent samples via IRT. IM was not directly measured, but rather assumed to be a motivational factor distinguishing between the responses of sales manager applicants and incumbents who were responding to the personality scale (the Per-sonal Preferences Inventory [PPI]) as part of a validation study. Using the DFIT framework, Robie et al. (2001) found that, while applicants scored approximately 1/2 standard deviation higher than incumbents across personality scales, only one of six personality scales included items that functioned differently across groups. In addition, none of the six scales showed evidence of DTF.

Perhaps the most comprehensive theoretical and empirical demonstration of such efforts was performed by Stark et al. (2001). In their comprehensive study, they examined the measurement equivalence of scores on the 16-PF in two ways. First, an IRT analysis exam-ined the equivalence of scores between applicants and non-applicants. Second, each group was split into high and low IM groups on the basis of the 16-PF’s IM scale. Stark and his colleagues generally found measurement equivalence when comparing groups divided based on the IM scale, but found substantial DIF when comparing applicants and non-applicants. Their conclusions were that 1) traited IM studies (i.e., using a single group, divided based on IM scale scores) underestimate the prevalence and impact of IM on personality tests and 2) tests that are normed on a sample of non-applicants may not function the same way for ap-plicants. Their results with respect to measurement equivalence when IM is operationalized as a trait (i.e., IM scale score), are compatible with those found in the Flanagan and Raju (1997) study. However, their analysis of IM from the situational approach, while confirming the results obtained by Zickar and Robie’s (1999) directed IM study, are at odds with Robie et al.’s (2001) study comparing applicants and incumbents.

Table 1 summarizes the results of the various research studies examining the measure-ment equivalence of personality scales between high and low IM groups. For each study, operationalization of IM (traited vs. situational), instrument used, method of assessing meas-urement equivalence, sample composition, and results are provided.

Page 6: The effects of traited and situational impression management on a

M.S. Henry & N.S. Raju 252

Table 1: Impression Management and Measurement Equivalence Research

Study Impression

management approach

Sample Instrument Measurement equivalence

method

Results*

Flanagan & Raju (1997)

Traited Job applicants

16-PF IRT ME

Zickar & Robie (1999)

Situational (directed

impression management)

Military recruits

ABLE IRT&CFA Some DIF

Robie, Zickar, & Schmit

(2001)

Situational

Job applicants

and incumbents

PPI IRT ME

Stark et al. (2001)

Situational and traited

Applicants and non-

applicants

16-PF IRT ME (Traited)

DIF (Situation.)

*ME = Measurement Equivalence; DIF = Substantial Differential Item Functioning

As described in Table 1, the studies examining the measurement equivalence of personal-

ity scales across IM groups have found conflicting results. One possible explanation for this finding lies in the samples/comparisons used by the different researchers. Zickar and Robie used a directed IM design, Robie et al. compared applicants to incumbents in a single job group and organization, and Stark et al. compared heterogeneous samples of applicants and non-applicants across a variety of organizations.

The present study This study attempted to improve upon previous research in the IM/measurement equiva-

lence research in a number of ways. First, IM was measured as a situational variable and as a trait, similar to Stark et al. (2001). Second, this study utilized a situational manipulation of IM that compares relatively homogenous groups where only the testing context is different. The issue of whether IM is primarily a situational variable or an individual difference is the most salient issue in the IM literature (c.f. Smith & Ellingson, 2002). In addition, this study analyzed data separately in two occupational groups – sales/marketing and finance/account- ing. Previous studies have relied on data from either heterogeneous groups or single occupa-

Page 7: The effects of traited and situational impression management on a

Impression management 253

tional samples. Finally, this study assessed the measurement equivalence of the California Psychological Inventory (CPI). The CPI is a widely used instrument that has not been the subject of a study assessing measurement equivalence between high and low IM groups.

Method Sample

The sample consisted of individuals who completed the CPI as part of an individual as-

sessment/assessment center with an external consulting firm between the years 1995 and 1999, and who were either participating as applicants in a selection or promotional process, or those who were participating for the purpose of feedback and development. In addition, individuals were selected based on their current job assignments. Specifically, two samples were selected: those who indicated they perform sales/marketing functions, and those who indicated they perform finance/accounting functions. Since individuals in this sample could select more than one job function, individuals who selected both sales/marketing and fi-nance/accounting were excluded to eliminate any overlap. The total sample comprised 6,220 individuals. In the finance/accounting sample, 1,124 participants were selected in the selec-tion/promotion condition and 621 were selected in the feedback/development condition in the same job group. In the sales/marketing sample, the selection group consisted of 2,868 individuals, while the development group consisted of 1,607 individuals.

In this data set, incumbents could be also applicants (i.e., applying for an internal promo-tion). In the overall selection sample (N = 3,993), approximately 45% of the participants were candidates for internal promotion, whereas approximately 49% were external selection candidates. Within the overall development sample (N=2,229), all were incumbents. Others (i.e. Robie, Zickar & Schmit, 2001) have simply compared applicants and incumbents. How-ever, in this data set, incumbents in many cases were applicants for promotions. The authors believed that an internal applicant for promotion would be more prone to engaging in IM than an incumbent taking a personality test as part of a validation study, or for research pur-poses. Therefore, the purpose of the assessment (selection vs. development) was used to identify high and low IM groups.

The age, ethnicity and gender composition of the sales/marketing sample and the ac-counting/finance sample were quite similar, and were very similar across IM conditions.

Measure The measure used in this study was form 462 of the California Psychological Inventory

(CPI), a measure of normal adult personality used for a variety of purposes, including busi-ness/industrial settings (Gough, 1987). This version of the CPI consists of 462 dichotomous (True/False) items, which are used to create scores on 20 scales. Three of these scales, Well-Being, Communality, and Good Impression are used in part to assess test taking attitudes.

The CPI manual (Gough, 1987) describes the Good Impression (GI) scale as having two purposes, “to assist in identifying protocols confounded by too strenuous attempts to claim favorable attributes and virtues, and to define a continuum in which IM varies from a pro-

Page 8: The effects of traited and situational impression management on a

M.S. Henry & N.S. Raju 254

jected indifference to others’ evaluations to a very active desire to be seen as a good and admirable person” (p. 115). The GI management scale was developed using a contrasted groups method where groups were either instructed to fake or respond honestly. Items which discriminated between the two groups were included on the GI scale. It has been shown to correlate well with measures of IM on other tests (Gough & Bradley, 1996).

Procedure Four separate analyses were conducted. IM was operationalized as both a situational

variable and a trait variable for both job groups. The situational IM investigation was made by comparing those taking the CPI under selection/promotion versus development/feedback purposes. This method of measuring situational IM has been utilized in previous research (c.f., Smith & Robie, 2004). The traited IM investigation was made by splitting the applicant sample (i.e., selection/promotion) within each job group into high and low IM groups using the CPI’s GI scale. In order to split groups, an examination of the frequency distribution of the GI Scale was examined. In the past, researchers have set the cutoff anywhere between the median (Stark et al., 2001) to the top 15th percentile (Flanagan & Raju, 1997) for identifying high IM test takers. An examination of the frequency distributions suggested that the top quartile appeared to be a reasonable cutoff for identifying those high in IM, while low/average IM respondents were represented by the 50th percentile and below. The top quartile was represented by a raw score of 29 or greater on the CPI GI scale in both the sales/marketing and accounting/finance samples. Those with scores in this group scored significantly higher on not only the GI scale, but on the conscientiousness scales as well (see Table 6). Table 2 describes the measurement equivalence analyses.

Rather than analyze all 20 CPI scales, it was determined to be more useful to identify scales that have been shown to be useful in a work psychology context. Specifically, the Big Five taxonomy has been demonstrated to provide a useful framework for organizing person-ality items and scales for predicting job performance (Barrick & Mount, 1991; Tett et al., 1991). Of the five personality dimensions encompassed in this model, the conscientiousness

Table 2:

Measurement Equivalence Analyses

Job Group Situational

Impression Management Traited

Impression Management Sales/Marketing Selection Comparison 1 High Comparison 2 Development Low Finance/Accounting Selection Comparison 3 High Comparison 4 Development Low

Page 9: The effects of traited and situational impression management on a

Impression management 255

factor has received the most attention in the personnel selection literature. Conscientiousness is rationally linked to job performance because it includes such facets as competence, duti-fulness, and achievement striving (Costa & McCrae, 1985), and has been empirically linked to job performance (Barrick & Mount, 1991; Tett, Jackson & Rothstein, 1991; McHenry, Hough, Toquam, Hanson, & Ashworth, 1990).

IRT assumes unidimensionality of the scales/tests being investigated (c.f., Hambleton, Swaminathan, & Rogers, 1991). While prior research suggests that scales from the CPI measure conscientiousness (McCrae, Costa & Piedmont, 1993; Fleenor & Eastman, 1997), several characteristics of this instrument make the use of existing scales problematic. First, factor analysis results presented in the CPI manual strongly suggest that the existing scales are not unidimensional. Factor analyses of these scales produced four or more factors (Gough, 1987). Second, items overlap between scales. Rather than using existing scales, it was decided that a factor analysis should be conducted of all CPI items to identify scales measuring facets of conscientiousness. Preliminary analyses conducted with this dataset found that none of the scales identified in the literature as being empirically related to con-scientiousness remotely met criteria for unidimensionality (thus violating a key assumption of IRT). Therefore, an exploratory factor analysis (EFA), specifically a principal compo-nents analysis (PCA) with varimax rotation, was performed on all items of the CPI minus the 40 items that comprise the GI scale.

Analyses Analyses of the scale scores derived from the CPI were conducted to assess the meas-

urement equivalence between high and low IM groups in both occupational categories. Given that the CPI utilizes a dichotomous response scale, the two-parameter logistic model of IRT was used for dichotomous data. The IRT program BILOG-MG 3.0 (Zimowski, Mu-raki, Mislevy, & Block, 2002) was used to estimate the a and b parameters. Parameters ob-tained for different groups were put on the same scale using the computer program EQUATE 2.1 (Baker, 1995).

In order to test for measurement equivalence in each of the analyses outlined above, DFIT was required. This was done using the DFITD7 program developed by Raju (2005). This program provides values for compensatory and non-compensatory differential item functioning (CDIF and NCDIF, respectively), and values for differential test functioning (DTF). The DFITD7 program is capable of utilizing a new method for identifying items with statistically significant DIF and DTF. Based on research by Oshima et al. (in press), individ-ual cutoff scores are generated for each item by obtaining a (1 - α) percentile rank score from a frequency distribution of NCDIF values under a no-DIF condition by generating a large number of item parameters based on the item parameter estimates and their variance-covariance structures, provided by programs such as BILOG-MG. This new method of iden-tifying items with significant DIF is advantageous, in that cutoff values for NCDIF are tai-lored to a unique dataset, so that differences in sample size as well as other factors which influence cutoff values (such as the standard errors of the item parameter estimates) are taken into account.

Page 10: The effects of traited and situational impression management on a

M.S. Henry & N.S. Raju 256

Results In order to identify conscientiousness factors within the CPI that could be used as a basis

for DFIT analyses, a PCA was run using all items from the CPI, excluding GI items, on the entire sample of 6,220 cases, which yielded a 33-component solution. Of the 33 factors, six appeared to be interpretable, sufficiently large (five or more items) and conceptually related to the Big Five dimension of conscientiousness.

The first conscientiousness factor (Responsibility) included seven items that assess the extent to which individuals have been, “in trouble with the law”, and have engaged in risky/irresponsible behavior, such as excessive alcohol and drug use. The second factor (Order) included 12 items which measured the extent to which individuals prefer to keep possessions, materials, etc., in order, plan out work and activities in advance, keep schedules, etc. The third factor (School Achievement) included five items that measured the extent to which individuals enjoyed and attempted to do their best in school. The fourth factor (Rule-Consciousness) included 11 items that measured the extent to which individuals favor strict rule enforcement, are inflexible in their beliefs, are decisive and believe in an internal locus of control. The fifth factor (Dutifulness) included 14 items that assessed the extent to which individuals express willingness to help others and contribute as citizens to society, pay taxes, respect individuals and their roles, etc. The sixth factor (Delinquency) consisted of seven items and reflected the extent to which individuals misbehaved, were disciplined or were delinquent when younger or while in school. Items within each scale were renumbered ac-cording to their factor loadings (they do not represent the actual item numbers on the CPI; for more information about these scales contact the first author). Items from all scales were recoded so that higher scores reflected higher levels of conscientiousness. The factors ap-peared similar to many facets of conscientiousness identified by Roberts, Chernyshenko, and Stark (2005).

Unidimensionality was assessed using PCA. In order to assess the goodness of fit of a single-factor solution for each of the six conscientiousness scales, the dichotomous response format of the CPI necessitated the creation of tetrachoric correlation matrices for each of the six scales using PRELIS. PCAs of the six tetrachoric correlation matrices were run using SPSS 13.0 (SPSS, Inc., 2004)). Results were examined to determine 1) the eigenvalue of the first component, compared to subsequent components, 2) the percent of variance accounted for by the first component compared to subsequent components, and 3) where breaks in the scree plots appeared evident.

The results of the PCAs provided evidence of the unidimensionality of the six scales. The first components in each of the scales all exceed Reckase’s (1979) cutoff of at least 20% of the variance. In fact, the first components in each of the scales accounted for between 27 and 55 percent of the variance in the respective scales. In addition, for each of the scales, the eigenvalues for the first factor tended to be substantially larger than the eigenvalues for subsequent factors. This suggests that these six scales meet minimal criteria for unidimen-sionality. Finally, the scree plots for each of the six scales were examined to determine where breaks occurred; in nearly all cases, a dominant first factor appeared. Pearson correla-tions were computed between the six derived conscientiousness scales and appear in Table 3. The highest correlations were between Delinquency and Responsibility (.35), Delinquency and School Achievement (.31) and Rule-Consciousness and Order (.30). Dutifulness and

Page 11: The effects of traited and situational impression management on a

Impression management 257

Table 3: Correlations between Six Facets of Conscientiousness

Responsibility Order School

Achievement Rule-

Consciousness Dutifulness Delinquency Responsibility 1.00 0.21* 0.16* 0.08* 0.09* 0.35* Order .21* 1.00 0.17* 0.30* 0.03* 0.17* School Achievement 0.16* 0.17* 1.00 -0.02 0.14* 0.31* Rule- Consciousness 0.08* 0.30* -0.02 1.00 -0.11* 0.02 Delinquency 0.35* 0.17* 0.31* 0.02 0.12* 1.00 *p <.05.

Rule-Consciousness were the only two facets which showed a significant negative correla-tion (-.11).

The results of the IRT and DFIT analyses for each IM condition are presented in the fol-lowing sections. Results are presented for the two methods of assessing IM (situational and traited), with the results for both samples (sales/marketing and accounting/finance) presented simultaneously. In this study, high IM groups were considered to be the focal group and low IM groups were considered to be the reference group. These designations were made be-cause the high IM groups were the focus of the study. In other words, the authors were inter-ested in the degree to which item responses from those high in IM departed from those whose responses were not manifestly influenced by IM.

Results for situational impression management investigation (sales and accounting samples)

Scale means and standard deviations for the six scales, in both high (selection) IM and

low (development) IM conditions, are provided in Table 8. For each scale and sample, the means were higher in the selection group than in the development group, providing evidence that those identified as fakers scored higher on substantive personality factors.

Items were equated by putting them on the scale of the focal group, which in this case was the selection/promotion group. Items were considered to have significant NCDIF if values exceeded the cutoff score generated for an alpha level of .01. A summary of the re-sults of the DIF/DTF analyses for the situational IM (both sales and accounting samples) is provided in Table 5. The number of items in each scale is in parentheses.

Page 12: The effects of traited and situational impression management on a

M.S. Henry & N.S. Raju 258 Ta

ble

4:

Scal

e M

eans

and

Sta

ndar

d D

evia

tions

: Situ

atio

nal I

mpr

essi

on M

anag

emen

t Inv

estig

atio

n

*Goo

d Im

pres

sion

Sca

le; *

*Coh

en’s

D st

atis

tic

Page 13: The effects of traited and situational impression management on a

Impression management 259

Table 5: Items and Scales Demonstrating Significant DIF/DTF for Situational Impression Management

Investigation

Sales Sample Accounting Sample

Scale (# of items) DIF DTF DIF DTF

Responsibility (7) -- No -- No

Order (12) 2, 5, 12 No 5, 6, 10, 12 No

School Achievement (5) 4 No -- No

Rule-Consciousness (11) 1, 9 Yes -- No

Dutifulness (14) -- No -- No

Delinquency (7) -- No -- No

In the sales sample, the Order scale contained three items (2, 5 and 12) that demonstrated

significant NCDIF. The School Achievement scale contained one item (4) that demonstrated significant NCDIF. The Rule-Consciousness scale contained two items (1 and 9) that dem-onstrated significant DIF, and was the only scale where DTF was significant. DTF could be eliminated by removing one item from this scale (Item 9).

In the accounting sample, only the Order scale contained items with significant DIF (Items 5, 6, 10 and 12). Although four of the 12 items demonstrated DIF, DTF for this scale was not significant. It is important to note that two of the items demonstrating significant DIF on this scale (5 and 12) were common across the two samples.

Discussion of DFIT/DTF results for situational impression management investigation

Of the 56 items contained in the six conscientiousness scales, only six demonstrated sig-

nificant DIF in the sales sample, and four demonstrated significant DIF in the accounting sample. Of the six scales examined, one scale (Rule-Consciousness) showed evidence of significant DTF in the sales sample, and no scales demonstrated DTF in the accounting sam-ple. DTF could be eliminated in the Rule-Consciousness scale with the removal of one item. DIF appeared to be concentrated in the Order and Rule-Consciousness scales. Evidence of both uniform and non-uniform DIF were present in the items identified.

Page 14: The effects of traited and situational impression management on a

M.S. Henry & N.S. Raju 260

Results for traited impression management investigation (sales and accounting samples)

Scale means and standard deviations for the six scales, in both high (high GI) IM and

low (low GI) IM conditions are presented for both samples in Table 11. As was the case in the situational IM investigation, for each scale, and in each sample, the means were higher in the high IM group than in the low.

A summary of the results of the DIF/DTF analyses for the traited IM (both Sales and Ac-counting samples) is provided in Table 13. In the sales sample, DIF and DTF statistics for the School Achievement scale could not be initially computed. Further investigation deter-mined that the variance of the a parameter for Item 5 in the high IM group was exceptionally large (126.34). As a result, Item 5 was eliminated from the DIF analysis, and CDIF, NCDIF and DTF values were computed with Items 1 through 4.

In the sales sample, the Order scale contained three items (4, 5 and 12) that demonstrated significant NCDIF and the Order scale was the only scale that demonstrated significant DTF which could be eliminated by removing one item from this scale (Item 4). The Rule-Consciousness scale contained two items (3 and 8) that demonstrated significant DIF.

In the accounting sample, the Order scale and the Rule-Consciousness scale each con-tained one item that demonstrated significant DIF (Items 9 and 3 respectively). None of the scales in the Accounting sample demonstrated significant DTF. Item 3 of the Rule-Consciousness scale demonstrated significant DIF in both the Sales and Accounting samples.

Discussion of DFIT/DTF results for situational impression management investigation

Of the 56 items contained in the six conscientiousness scales, only five demonstrated

significant DIF in the sales sample, and two demonstrated significant DIF in the Accounting sample. Of the six scales examined, only one (Order, sales sample) demonstrated DTF. Sig-nificant DTF could be eliminated by removing a single item. As was the case in the situ-ational IM investigation, DIF appeared to be concentrated in the Order and Rule-Consciousness scales. While evidence of both uniform and non-uniform DIF was present in the items identified, in most cases DIF uniformly favored the high IM group.

Discussion This study addressed several issues related to the current literature on IM and personality

tests. First, it examined measurement equivalence between individuals identified as high in IM and individuals who were identified as average or low in IM. Second, it examined IM from two perspectives: as a situational variable and as a trait. Third, the situational investiga-tion of IM involved individuals who were similar in terms of demographics and occupation, where only the purpose of the assessment differed. Finally, measurement equivalence was examined in two distinct occupational groups.

Page 15: The effects of traited and situational impression management on a

Impression management 261 Ta

ble

6 Sc

ale

Mea

ns a

nd S

tand

ard

Dev

iatio

ns: T

raite

d Im

pres

sion

Man

agem

ent I

nves

tigat

ion

*Goo

d Im

pres

sion

Sca

le; *

*Coh

en’s

D st

atis

tic

Page 16: The effects of traited and situational impression management on a

M.S. Henry & N.S. Raju 262

Table 7: Items and Scales Demonstrating Significant DIF/DTF for Traited Impression Management

Investigation

Sales sample Accounting sample

Scale (# of items) DIF DTF DIF DTF

Responsibility (7) -- No -- No

Order (12) 4, 5, 12 Yes 9 No

School Achievement (5) -- No -- No

Rule-Consciousness (11) 3, 8 No 3 No

Dutifulness (14) -- No -- No

Delinquency (7) -- No -- No

With respect to the findings of this study, several aspects of the results are noteworthy.

First, the level of DIF and DTF across both situational and traited IM investigations and in both the sales and accounting samples was relatively minimal. DIF affected a relatively small number of items across all of the measurement equivalence analyses (17 items across 224 comparisons), and DTF occurred on only two scales across 24 comparisons. In each case, DTF could be to non-significance through the elimination of only one item. This should be reassuring to practitioners who utilize personality tests to make inferences about applicants’ personality traits. Concerns regarding the possibility of large-scale measurement inequiva-lence between those IM and those responding honestly were not supported by this study.

While DIF did not occur on a large-scale level, DIF was present in a number of items. In particular, the Order scale appeared particularly susceptible to DIF; of the 17 instances, 11 were items from this scale. This suggests that some constructs may be more susceptible to DIF caused by IM than others, which is consistent with some previous research on IM and measurement equivalence. For example, Zickar and Robie (1999) examined three personality scales and found that 15 of 25 items displaying DIF were from one scale (Non-Delinquency). Likewise, Robie et al. (2001) examined measurement equivalence on six scales; the two items that showed significant DIF were both from the same scale (Work Focus). While many if not most of the items in the six conscientiousness scales examined in this study would likely elicit socially desirable responding, the Order scale is perhaps the most obviously job-related scale. Items on the Order scale assess orderliness, planning, or-ganization and other characteristics, which may be more closely linked to an employment context. One possible explanation for the concentration of DIF in the Order scale is that it is more readily perceived as job-related, and thus primes individuals likely to engage in IM.

Page 17: The effects of traited and situational impression management on a

Impression management 263

Another notable finding is the convergence of findings across samples and IM ap-proaches. Four sets of comparisons were made with respect to the six conscientiousness scales. Two samples (sales and accounting) were examined in two IM investigations (situ-ational and traited). While Stark et al. (2001) found little convergence between traited and situational approaches to IM at either the scale or item level, this study showed some con-vergence. In particular, both the Order and Rule-Consciousness scales contained items dis-playing DIF across IM approaches. With the exception of one item from the School Achievement scale, no items from any of the other scales displayed DIF. In contrast to the scale-level results, relatively few items showed DIF across IM approaches. Two items from the Order scale in the sales sample showed significant DIF in both the traited and situational investigations. Thus, consistency between scales with items containing DIF was present, but the actual items themselves tended to be different. While it is encouraging that the two dif-ferent methods of assessing IM identified the same scales as being problematic, it appears that, at the item level, IM may operate differently as a function of how it is operationalized.

DIF was manifested in differing ways. In the traited IM investigation, DIF appeared somewhat more predictable and consistent, with most items uniformly biased in favor of the high IM group. Even where non-uniform DIF was present, the high IM group was favored across most of the theta scale. In contrast, the situational IM investigation produced quite unpredictable results. While relatively few items showed significant DIF, for those that did, DIF was manifested in very different ways. DIF was both uniform and non-uniform and favored both the selection (high IM) and development (low IM) groups, depending on the item. Again, these findings suggest that DIF may be manifested differently according to how it is assessed.

There are important practical implications of the work presented in this paper. Much of the previous literature on the impact of IM on personality scores has focused on the degree to which criterion-related validity may or may not be compromised. However, practitioners are likely to be concerned with the degree to which the meaning of personality scale scores changes due to faking. Personality scale scores carry meaning and are typically interpreted by a psychologist or test user. Inferences are made regarding the test taker’s profile on these scales. The extent to which items and scales carry the same psychological meaning across situations or response styles is therefore an important issue. The finding that item and scale level responses were relatively free of DIF and DTF should encourage those who use and interpret data from personality tests. While the different methods of assessing IM (traited and situational) led to differences in the magnitude of mean score differences, measurement equivalence remained intact in the vast majority of cases. Therefore, practitioners may ex-pect some elevation in conscientiousness scores when individuals engage in IM (although the magnitude of this elevation is still open to debate), the underlying meaning of these test items and scales should be interpretable for both groups. This should be encouraging to practitioners who interpret personality scale scores and make decisions on such interpreta-tions.

Page 18: The effects of traited and situational impression management on a

M.S. Henry & N.S. Raju 264

Limitations There are a number of limitations that need to be addressed with respect to this study.

First, this study did not use intact personality scales; unidimensional scales needed to be constructed from the CPI item pool. While the scales constructed from the CPI items corre-lated reasonably with existing CPI scales, the results obtained in this study cannot necessar-ily be generalized to the CPI scales as they are scored.

While the six derived conscientiousness scales from the CPI all technically met Reckase’s (1979) criterion for unidimensionality, it should be noted that the results of PCAs as well as the scree plots, suggested that in some cases more than one factor could have been derived from these scales. While research has shown that IRT models are robust to moderate or even large departures from unidimensionality (Drasgow and Parsons, 1983), the results of the unidimensionality assessments of some of the scales are of concern.

It should be noted that some items on two of the smaller conscientiousness scales (School Achievement and Delinquency) showed evidence of extremely poor fit; this was particularly the case in the smaller accounting sample (as opposed to the sales sample). In nearly all cases, these items had very large item parameters (e.g., a parameters greater than 5.0) and item means (p-values) approaching 1.0. These items also had item parameters that contained substantial error variances. These data strongly suggest that DIF would be difficult to detect, even if it were present in some cases.

Finally, the situational investigation of IM utilized in this study is open to critique. While previous studies of IM as a situational variable may have been too broad (Stark et al., 2001) or involved directed IM studies that are not representative of real world data (Zickar & Ro-bie, 1999), one could argue that the situational operationalization of IM in this study mini-mized the impact of IM. While the high IM group consisted of either internal or external job applicants, the low IM group consisted of participants in an assessment for develop-ment/feedback purposes. One could certainly argue that the motivation to fake was present in the development/feedback group as well, albeit to a lesser extent. Since there is no perfect situational manipulation of IM, one always runs the risk of either over-estimating the impact of IM (as is arguably the case with directed IM studies), or under-estimating the effect of IM. Since the base rate of IM is unknown, it is impossible to determine whether one’s IM ma-nipulation is underestimating the impact of IM because both groups are IM, or over-estimating the impact of IM because the manipulation does not reflect how IM occurs in field settings.

References

Baker, F. (1995). EQUATE Computer Program Version 2.1. Madison: University of Wisconsin. Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job

performance: A meta-analysis. Personnel Psychology, 44, 1-26. Cattell, R. B., Eber, H. W., & Tatsuoka, M. M. (1970). Handbook for the Sixteen Personality

Factor Questionnaire (16 PF). Champaign, IL: Institute for Personality and Ability Testing. Cellar, D. F., Miller, M. L., Doverspike, D. D., & Klawsky, J. D. (1996). Comparison of factor

structures and criterion-related validity coefficients for two measures of personality based on the five-factor model. Journal of Applied Psychology, 81, 694-704.

Page 19: The effects of traited and situational impression management on a

Impression management 265

Christiansen, N. D. (1998, April). Sensitive or senseless? On the use of social desirability measures to identify applicant response distortion. Poster session presented at the annual meeting of the Society for Industrial and Organizational Psychology, Dallas, TX.

Collins, J. M.. & Gleaves, D. H. (1998). Race, job applicants, and the five-factor model of personality: Implications for Black psychology, industrial/organizational psychology, and the five-factor theory. Journal of Applied Psychology, 83, 531-544.

Costa, P. T., & McCrae, R. R. (1985). The NEO Personality Inventory manual. Odessa, FL: Psychological Assessment Resources.

Costa, P. T., & McCrae, R. R. (1992). NEO-PI-R professional manual. Odessa, FL: Psychological Assessment Resources.

Douglas, E. F., McDaniel, M. A., & Snell, A. (1996, August). The validity of non-cognitive measures decays when applicants fake. Paper presented at the annual conference of the Academy of Management, Cincinnati.

Drasgow, F., & Hulin, C. L. (1990). Item response theory. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol. 1, pp. 577–636). Palo Alto, CA: Consulting Psychologists Press.

Drasgow, F., & Kanfer, R. (1985). Equivalence of psychological measurement in heterogeneous populations. Journal of Applied Psychology, 70, 662-680.

Drasgow, F., & Parsons, C. K. (1983). Application of unidimensional item response theory models to multidimensional data. Applied Psychological Measurement, 7, 189-199.

Ellingson, J. E., Sackett, P. R., & Hough, L. M. (1999). Social desirability corrections in personality measurement: Issues of applicant comparison and construct validity. Journal of Applied Psychology, 84, 155-166.

Embretson, S.E., & Reise, S.P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.

Flanagan, W. J., & Raju, N. S. (1997, April). Measurement equivalence between high and average impression management groups: An IRT analysis of personality dimensions. Paper presented at the annual meeting of the Society for Industrial and Organizational Psychology, St. Louis.

Fleenor, J. W., & Eastman, L. (1997). The relationship between the Five-Factor Model of personality and the California Psychological Inventory. Educational and Psychological Measurement, 57, 698–703.

Frei, R. L., Griffith, R. L., McDaniel, M. A., Snell, A. F., & Douglas, E. F. (1997, April). Faking non-cognitive measures: Factor invariance using multiple groups LISREL. In G. Alliger (Chair), Faking matters. Symposium conducted at the annual meeting of the Society for Industrial and Organizational Psychology, St. Louis.

Furnham, A. (1986). Response bias, social desirability and dissimulation. Personality and Individual Differences, 7, 385-400.

Gough, H. G. (1987). California Psychological Inventory administrator’s guide. Palo Alto: Consulting Psychologists Press.

Gough, H. G., & Bradley, P. (1996). California Psychological Inventory manual (3rd ed.). Palo Alto: Consulting Psychologists Press.

Griffith, R. (1997). Faking of non-cognitive selection devices: Red herring is hard to swallow. Unpublished doctoral dissertation, University of Akron, Akron, OH.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications, Inc.

Hauenstein, N. M. A. (1998, April). Faking personality tests and selection: Does it matter? In M. A. McDaniel & A. F. Snell (Chairs), Applicant faking with non-cognitive tests: Problems and

Page 20: The effects of traited and situational impression management on a

M.S. Henry & N.S. Raju 266

solutions. Symposium conducted at the annual meeting of the Society for Industrial and Organizational Psychology, Nashville, TN.

Hogan, R. T. (1991). Personality and personality measurement. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol. 2, pp. 873-919). Palo Alto, CA: Consulting Psychologists Press.

Hogan, R., Hogan, J., & Roberts, B. (1996). Personality measurement and employment decisions: Questions and answers. American Psychologist, 51, 469-477

Hough, L. M. (1998). Effects of intentional distortion in personality measurement and evaluation of suggested palliatives. Human Performance, 11, 209-244.

Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterion-related validities of personality constructs and the effect of response distortion on those validities. Journal of Applied Psychology, 75, 581-595.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.

McCrae, R. R., & Costa, P. T. (1983). Social desirability scales: More substance than style. Journal of Consulting and Clinical Psychology, 51, 882-888.

McCrae, R. R., Costa, P. T., & Piedmont, R. L. (1993). Folk concepts, natural language, and psychological constructs: The California Psychological Inventory and the five-factor model. Journal of Personality, 61, 1-26.

McHenry, J. J., Hough, L. M., Toquam, J. L., Hanson, M. A., & Ashworth, S. (1990). Project A validity results: The relationship between predictor and criterion domains. Personnel Psychology, 43, 335-367.

Michaelis, W., & Eysenck, H. J. (1971). The determination of personality inventory factor patterns and intercorrelations by changes in real-life motivation. Journal of Genetic Psychology, 118, 223-234.

Montag, I., & Levin, J. (1994). The five factor personality model in applied settings. European Journal of Personality, 8, 1-11.

Nicholson, R.A., & Hogan, R. (1990). The construct validity of social desirability. American Psychologist, 45(2), 290-292.

Ones, D. S., & Viswesvaran, C. (1998). The effects of social desirability and faking on personality and integrity assessment for personnel selection. Human Performance, 11, 245-269.

Oshima, T. C., Raju, N. S., & Flowers, C. (1997). Development and demonstration of multidimensional IRT-based internal measures of differential functioning of items and tests. Journal of Educational Measurement, 34, 253-272.

Oshima, T. C., Raju, N. S., & Nanda, A. O. (in press). A new method for assessing the statistical significance of item-level differential functioning indices in the Differential Functioning of Items and Tests (DFIT) framework. Journal of Educational Measurement.

Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495-502. Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between

two item response functions. Applied Psychological Measurement, 14, 197-207. Raju, N. S. (1999). Some notes on the DFIT framework. Chicago: Illinois Institute of

Technology. Raju, N. S. (2005). DFITPD7: A FORTRAN program for calculating DIF/DTF [Computer

software]. Chicago: Illinois Institute of Technology. Raju, N. S., & Ellis, B. B. (2003). Differential item and test functioning. In F. Drasgow & N.

Schmitt (Eds.), Measuring and analyzing behavior in organizations (p. 156-188). San Francisco, CA: Jossey-Bass

Page 21: The effects of traited and situational impression management on a

Impression management 267

Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353-368.

Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational Statistics, 4, 207-230.

Roberts, B. W., Chernyshenko, O. S., & Stark, S. (2005). The structure of conscientiousness: An empirical investigation based on seven major personality questionnaires. Personnel Psychology, 58, 103-139.

Robie, C., Zickar, M. J., & Schmit, M. J. (2001). Measurement equivalence between applicant and incumbent groups: An IRT analysis of personality scales. Human Performance, 14, 187-207.

Schmit, M. J., & Ryan, A. M. (1993). The Big Five in personnel selection: Factor structure in applicant and nonapplicant populations. Journal of Applied Psychology, 78, 966-974.

Smith, D. B. (1996). The Big Five in personnel selection: Reexamining frame of reference effects. Unpublished master's thesis, University of Maryland, College Park.

Smith, D.B., & Ellingson, J. E. (2002). Substance versus style: A new look at social desirability in motivating contexts. Journal of Applied Psychology, 87, 211-219.

Smith, D. B., Hanges, P. J., & Dickson, M. W. (2001). Personnel selection and the five-factor model: Reexamining the effects of applicants' frame of reference. Journal of Applied Psychology, 86, 304-315.

Smith, D. B., & Robie, C. (2004). The implications for impression management for personality research in organizations. In B. Schneider, D. B. Smith, A. P. Brief, & J. P. Walsh (Eds.), Personality and Organizations. Mahwah, NJ: Lawerence Erlbaum Associates.

Snell, A. F., & McDaniel, M. A. (1998, April). Faking: Getting data to answer the right questions. Paper presented at the 13th Annual Conference of the Society for Industrial and Organizational Psychology, Dallas, TX.

SPSS, Inc. (2004). SPSS (Version 13.0) [Computer software]. Chicago: author. Stark, S., Chernyshenko, O. S., Chan, K. Y., Lee, W. C., & Drasgow, F. (2001). Effects of the

testing situation on item responding: Cause for concern. Journal of Applied Psychology, 86, 943-953.

Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44, 703-742.

Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147-169). Hillsdale, NJ: Erlbaum.

Zickar, J., & Robie, C. (1999). Modeling faking good on personality items: An item level analysis. Journal of Applied Psychology, 84, 551-563.

Zimowski, M., Muraki, E., Mislevy, R., & Bock, R. D. (2002). BILOG-MG (Version 3.0) [Computer software]. Chicago: Scientific Software International, Inc.