personality and language in computer-mediated ...homepages.inf.ed.ac.uk/jon/papers/pkfa4.pdf ·...

29
Personality and Language in Computer-mediated Communication: Factors and features across models and genres AG et al address info Abstract Computer-mediated communication (CMC) continues to increase in popularity. It is known that personality is important in such environments, influencing both how we express ourselves linguistically, and how we are perceived online. Here we build in two ways on previous work which has used the LIWC text analysis tool to derive language factors relating to personality. First, we investigate whether the linguistic patterning of factor structure in e-mail is similar to that in weblogs, and how these structures compare to previous findings for non-CMC, written language. Secondly, we compare how these CMC language factors relate to personality, and note dif- ferences between EPQ-R and Five-Factor Measures. Our findings in CMC broadly replicate results in offline studies, although differences are isolated which distinguish blogs from non-CMC language, with e-mail sharing features with both genres. When considering the relation between language factors and personality, we note that dif- ferent patterns of language behaviour for Neuroticism and Agreeableness emerge in CMC versus non-CMC environments, and that results from the two personality measures are largely compatible. [Wordcount, excluding frontmatter: ] Key words: Personality, Language, Computer-mediated communication, Text analysis 1 Introduction Electronic media are pervasive forms of communication, and offer individuals new channels for everyday communication (Baron, 1998) REF?. Such meth- ods of communication allow us to reach the outside world by writing e-mails, running websites, or keeping weblogs. However, increasingly we are finding that in each of these cases such electronic communication can allow people Preprint submitted to JPSP 3 June 2006

Upload: phamtuyen

Post on 07-Apr-2018

222 views

Category:

Documents


6 download

TRANSCRIPT

Personality and Language in

Computer-mediated Communication: Factors

and features across models and genres

AG et al

address info

Abstract

Computer-mediated communication (CMC) continues to increase in popularity. Itis known that personality is important in such environments, influencing both howwe express ourselves linguistically, and how we are perceived online. Here we buildin two ways on previous work which has used the LIWC text analysis tool to derivelanguage factors relating to personality. First, we investigate whether the linguisticpatterning of factor structure in e-mail is similar to that in weblogs, and how thesestructures compare to previous findings for non-CMC, written language. Secondly,we compare how these CMC language factors relate to personality, and note dif-ferences between EPQ-R and Five-Factor Measures. Our findings in CMC broadlyreplicate results in offline studies, although differences are isolated which distinguishblogs from non-CMC language, with e-mail sharing features with both genres. Whenconsidering the relation between language factors and personality, we note that dif-ferent patterns of language behaviour for Neuroticism and Agreeableness emergein CMC versus non-CMC environments, and that results from the two personalitymeasures are largely compatible.

[Wordcount, excluding frontmatter: ]

Key words: Personality, Language, Computer-mediated communication, Textanalysis

1 Introduction

Electronic media are pervasive forms of communication, and offer individualsnew channels for everyday communication (Baron, 1998) REF?. Such meth-ods of communication allow us to reach the outside world by writing e-mails,running websites, or keeping weblogs. However, increasingly we are findingthat in each of these cases such electronic communication can allow people

Preprint submitted to JPSP 3 June 2006

access to information about fundamental aspects of our selves: our personal-ities. Recent research has shown that personality has significant influence onthe language patterns in e-mails or weblogs (??), and that strangers are ableto make accurate personality judgements based on viewing an individual’swebsite or e-mail (??).

Given that personality can be projected and perceived in computer-mediatedenvironments, we here explore the extent to which the relation between linguis-tic expression and personality in such environments varies, and whether it dif-fers from that found in offline communcation. To do this, we adopt the LIWCtext analysis method developed by Pennebaker and colleagues (Pennebakerand Francis, 1999), since this allows the direct comparison with self-disclosurein the form of diaries etc as reported in ?. Furthermore, we extend the studyof the relationship of personality and language, by comparing the previouslyderived language factors across three- and five-factor models of personality.

The paper is structured as follows. In the next section, we review the popular-ity and diversity of Computer-Mediated Communication (CMC) and compareit to traditional forms of speech and writing. We introduce the two main trait-based personality models, and the content analysis method which we will adoptin this paper. We also review previous literature which has examined linguisticcharacteristics of personality in writing, speech and CMC environments. Onthe basis of this previous work, we then frame the research questions to be ad-dressed. The following section indicates how we gathered two corpora to gelpexamine language in e-mail and weblog computer-mediated environments. Wethen report two experiments in turn. The first compares the linguistic factorstructure of the two types of CMC, and non-CMC writing. The second com-pares the relationships between personality and these language factors in CMCand non-CMC environments. In the latter case, the particular focus is on Ex-traversion and Neuroticism, but there is a secondary question concerning therole of Openness, Agreeableness and Conscientiousness versus Psychoticism.We finish up by discussing our findings and some of their implications.

2 Background

2.1 CMC and language

Computer-mediated communication, like other forms of writing, is less richthan face-to-face communication (Panteli, 2002). As a result alternative lin-guistic strategies in e-mail and internet relay chat have been adopted to pro-vide paralinguistic or social cues (Werry, 1996; Hancock and Dunham, 2001;Colley and Todd, 2002).

2

Internet CMC cannot be treated as a single genre, and is in fact composed ofa number of distict types of communication (?). For example, static webpagesmight be purely written, but instant messengers create conversations as closeto a spoken form as is possible for the written word. Relatively stable varietiesof internet CMC are emerging, although there is still variation within them,and new forms are continually evolving (??). However, this has not preventedresearchers from attempting to classify these forms of communication, eitherfunctionally or linguistically (????Biber, 1988).

E-mail predates the internet, and is one of the major contributors to itspopularity, with XX% internet users regularly accessing e-mail (REF?). Itis a written form, in which interlocutors are physically separated; it is alsodurable, and authors often use complex linguistic constructions; however, e-mail is often unedited, uses first- and second-person pronouns, present tenseand contractions, and it is generally informal in tone (Balter, 1998; Baron,2001). Indeed, characteristic e-mail features have been identified, including el-lipses, capitalisation, extensive use of exclamation marks, and question marks(termed ‘emailism’; Colley and Todd (2002)). As a result, e-mail is often con-sidered intermediate in form between speech and writing (?Baron, 1998; ?; ?),cf. (Collot and Belmore, 1996). However, e-mail certainly differs from speech,being more verbose, yet less emotional (?).

In contrast to the almost universal usage of e-mail, only 27% of internet usersread weblogs, and 62% still don’t know what they are (REF?). This is inspite of the fact that an estimated 8 million weblogs have been created inthe US alone (?). A weblog is a frequently updated website which containsnews and views on a variety of topics, from politics to gossip, and weblogshave been regarded as a powerful news-gathering tool (?) REF? UPDATE?.The term ‘blog’ is perhaps more widely used, and it is most commonly usedto refer to the sub-category of online personal diaries; blogs are the focusof what follows. In contrast to traditional diary-keeping, blogging is also asocial activity (??), with blogs being ‘. . . a studied minuet between bloggerand audience. Bloggers consider audience attention, feedback and feelings asthey write [. . . ] consciousness of audience is central to the blogging experience.’(?).

Like e-mail, blogs can also demonstrate characteristic linguistic features, withposts written in ‘short, paratactic sentences’ employing ‘informal, non-standardconstructions and slang’ (?). However, due to the social nature of blogging, itis also possible to observe communities and social groups, including in-groupand out-group language behaviour (I, me, my, we, us and our, rather thanthey, them and their) and shared background knowledge and concepts (?), cf/(Brown and Yule, 1983), as well as shared responses to traumatic events suchas September 11, 2001 (???).

3

2.2 Models of Personality

In this paper we refer to two main models and associated measurements of per-sonality: Eysenck’s three-factor model (Eysenck and Eysenck, 1991; Eysencket al., 1985), and the five-factor model (Digman, 1990; Costa and McCrae,1992; Wiggins and Pincus, 1992; Goldberg, 1993). Both of these describe Ex-traversion (Extraversion–Introversion) and Neuroticism (Emotionality–Stability)which are undisputed and central to theories of personality. To these, the three-factor model adds the trait of Psychoticism, while the five-factor model addsOpenness, Agreeableness and Conscientiousness (?Lippa and Dietz, 2000).

Although there are theoretical differences between models (Deary and Matthews,1993; Eysenck, 1970; Eysenck and Eysenck, 1991; Eysenck, 1993; Block, 1995;?; McCrae and Costa, 1987, 1997; Funder, 2001; Buss and Finn, 1987; PytlikZillig et al., 2002), a prominent question is how these models relate to eachother: Kline (1993) notes that for Eysenck’s EPQ measure, ‘Extraversion andNeuroticism are clearly identical to two of the big five factors and Psychoti-cism would appear to be a mixture [of the other three traits]’. In discussingour findings from both models, we explore this relationship in more detail,later in the paper (CHECK THIS).

2.3 Content analysis and factor analysis

Content analysis focusses on context-independent occurrences of lexical con-tent words in written text. Although there are many different approaches (seeSmith, 1992; Pennebaker et al., 2003, for an overview; Mehl, 2005 (REF?)and Oberlander & Gill, 2006 for alternative approaches), here we describe onemethod in particular, which uses the Linguistic Inquiry and Word Count textanalysis program (LIWC; Pennebaker and Francis, 1999). 1

Building on earlier walk by Gottshalk (REF?), LIWC counts occurrences ofwords or word-stems belonging to pre-defined semantic and syntactic cate-gories (which belong to four main groups: Linguistic Dimensions, Psychologi-cal Processes, Relativity, and Personal Concerns). These provide different waysof describing texts (Kilgarriff, 2001); also Will Lowe 2004 REF?; cf. seman-tic tagging (Rayson, 2003). For instance, using this system, words like could,should and would are categorised as ‘discrepancies’, allowing the overall per-centage of ‘discrepancy’ words to be calculated for the text as a whole. One ofthe major strengths of the LIWC text analysis approach is that the analysis

1 Note that more recent versions of the program have been released(LIWC2001;LIWC2006; Pennebaker et al., 2001), but to ensure comparability withresults obtained using LIWC, we describe this version here.

4

program and the dictionaries have been rated and validated by independentjudges (Pennebaker and Francis, 1999; Pennebaker and King, 1999). AlthoughLIWC counts some syntactic features, such as pronouns, and verbs of varioustenses, these are not derived from a part-of-speech analysis of the data (??,cf.).

Originally used to examine the relationship between language use in disclosurewith measures of health and well-being (Pennebaker et al., 1997; Pennebaker,1997; Graybeal et al., 2002; cf. Oxman et al., 1988), LIWC has since beenused to study a number of linguistic behaviours, including deception (New-man et al., 2003), gender (Mehl and Pennebaker, 2003), and individual differ-ences (Pennebaker and King, 1999). We now discuss the last of these studiesin greater detail, because it provides a reference point for comparisons withlanguage use and personality in CMC.

2.4 Personality language

NOTE: also include recent LIWC/personality literature

Pennebaker and King (1999) analysed texts written by authors for whom five-factor personality information was available. The studies used multiple writingsamples produced by over 800 participants, in three different conditions: dailydiaries by patients at an addiction centre; daily class assignments by summerschool students; and abstracts to journal articles written by social psycholo-gists. Factor analysis was used to derive a small number of factors groupingindividual LIWC variables and these were then correlated with writers’ scoreson personality dimensions. We note the similarity of this approach to thatwhich Biber used to explore genre (e.g., Biber, 1995). The four derived factorswere: Making Distinctions, Immediacy, Social Past, and Rationalization. Per-sonality dimensions related to the factors as follows: Extraverts used languageassociated with the Social Past, and avoided language associated with MakingDistinctions; Neurotic individuals used language associated with Immediacy;Individuals scoring high in Openness used language associated with the SocialPast (and some related to Making Distinctions), and avoided language asso-ciated with Immediacy; High Agreeableness scorers used language associatedwith Immediacy; High Conscientiousness scorers avoided language associatedwith Making Distinctions.

In a recent study of personality and language Mehl, et al., (in press) (REF?)used LIWC to analyse speech sampled from everyday interaction. Overall,they found that Extraverts have a higher word count, with shorter words;Neurotics have a lower word count; people with higher levels of Opennesstalk less about social processes, use fewer past tense words and third-person

5

pronouns; individuals higher in Agreeableness use more first-person pronouns,fewer articles and fewer swear words; and those higher in Conscientiousnessuse fewer words relating to negative emotions and swearing.

Analysis using LIWC is not the only work to examine the influence of personal-ity upon language and interaction, but most of the rest has focused exclusivelyon Extraversion. For instance, it has been found that Extraverts: initiate morelaughter; express more pleasure talk, agreement, and compliments; use moreself-referent statements; and talk more, focusing on extra-curricular activi-ties. Introverts, on the other hand, use more language relating to hedges andproblem talk (Gifford and Hine, 1994; Thorne, 1987). Additionally, Extravertshave been shown to talk more in some situations (Carment et al., 1965), cf.(Thorne, 1987).

At a lexico-grammatical level, Extraverts use a greater proportion of pronouns,adverbs, verbs and a higher total number of words in formal and informalsituations, with Introvert language features more closely related to formallanguage (Furnham, 1990; Dewaele and Furnham, 1999; ?; Dewaele, 2001).For reviews of personality and speech features. see also (Scherer, 1979; Smith,1992)

Include more pers/CMC literature - pennebaker&groom, hancock deception?

Considering CMC specifically, it has been argued that its perceived anonymityallows people to feel more comfortable participating in interactions, where theymay feel less able to do so in face-to-face (FTF) environments (Bloch, 2002;Yellen et al., 1995). However we note greater extravert participation in bothCMC and FTF environments. The liberating effect of CMC can also be notedin the intimacy of topics discussed in weblog and e-mail contexts (REF?), al-though we still suppose that audience effects are present (Bell, 1984), and me-diate the content relative to a traditional, private (non-online) diaries (REF?websites - (?)). Indeed, ratings of personal websites by unacquainted individ-uals showed evidence of enhancement in their projection of Extraversion andAgreeableness. However, this form of asynchronous CMC still demonstratesgood target-judge agreement, especially in the case of Openness (?).

Similarly, studies of language in CMC have demonstrated that there are sys-tematic patterns associated with e-mail and weblog data.

In weblogs, authors’ use of individual grammatical categories (parts-of-speech;POS) has been found to correlate with levels of their Openness and Agree-ableness. More Agreeable authors use more articles, and fewer verbs, adverbsand interjections; Authors with higher levels of Openness use more adjectivesand prepositions and, like the more agreeable authors, fewer adverbs (??).

more on nowson weblog n-grams, etc - ensure consistence with e-mail??

6

A more sophisticated analysis of an e-mail corpus has identified multi-itempatterns of words and word-stems, and parts-of-speech, in relation to person-ality (Gill and Oberlander, 2002; ?). Bottom-up stratified corpus comparison,showed, for instance, that High Extraverts used collocations involving inclusiveexpressions and connectives more broadly, generating conjoined noun phrases,and collocations involving proper names. ? carried out multiple regressionanalyses of this corpus using both the LIWC dictionary and a dictionarybased on the MRC psycholinguistic database (?). The latter contains tens ofthousands of words, with up to 26 linguistic and psycholinguistic attributesfor each word. The LIWC-based regression analysis revealed that high Neu-roticism is characterised by greater use of Inclusive and Total First-person(11% variance), whilst high Psychotics use fewer First-person Singular butmore Cognitive Mechanism words (11% variance). The MRC-based analysisreveals a low average level of concreteness in high Extravert language (5%variance). Conversely, high Neurotics use words with high levels of concrete-ness, combined with words more commonly found in speech (14% variance).Higher levels of Psychoticism are demonstrated by the use of more unusualwords and a variety of frequent and infrequent words from written contexts(11% variance) (?Gill and Oberlander, 2002).

3 Research Questions

This paper now addresses the following questions:

(1) (a) Is the linguistic patterning of factor structure in e-mail similar tothat in weblogs; and

(b) How do their structures compare to those found in non-CMC (writtenlanguage)?

(2) (a) Do these factors relate to personality differences (primarily Extraver-sion and Neuroticism) in the same way as in non-CMC; and

(b) Are any relations stronger or weaker if Psychoticism is substitutedfor the Agreeableness, Conscientiousness and Openness of previousstudies?

4 Method

To tackle these questions, we require corpora containing examples of two sub-types of CMC: e-mail, and blogs. We describe in turn the data collectionmethods used to construct them.

7

4.1 Corpus 1: E-mail

4.1.1 Participants

105 current or recently graduated university students took part in this study .All participants were recruited via e-mail sent by the experimenter; they werenot remunerated for their participation.

A sociobiographical questionnaire and Eysenck Personality Questionnaire-Revised (EPQ-R short version; ?) were administered to give information aboutthe participants’ backgrounds and personalities. 37 were males, and 68 females.The mean age of participants was 24.3. 53 were studying (or had studied) atan undergraduate level, and 52 at a postgraduate level; the mean number ofyears of higher education was 4.2. All spoke English as their first language:95 were of UK/Irish origin; 7 North American; and 3 Australasian. Scores onthe personality dimensions were as follows: Psychoticism (M=2.90, SD 1.7);Extraversion (M=7.91, SD 3.3); Neuroticism (M=5.51, SD 3.2); and Lie Scale(M=3.48, SD 2.2).

4.1.2 Materials and Procedure

The experiment was conducted on-line via an HTML form which participantsfilled in and then submitted over the internet. The web page had a simpledesign: After collecting the sociobiographical and personality information, asnoted above, the participants went on to the tasks which were to write ane-mail to a good friend that ‘you haven’t seen for quite some time’ describing(a) in the first task what they had done in the past week, and (b) in the secondtask what their plans were for the coming week. Participants were advised tospend about 10 minutes on each task, write in their normal English prose,and were assured of confidentiality (but that they could feel free to substitutenames or places if they desired).

4.1.3 Preparation of the corpus

This task generated 2 e-mail texts from each participant, giving 210 texts,and a total of around 65,000 words. Pre-editing was kept to a minimum toretain as much individuality as possible (for example, nonstandard words andspellings to imitate sounds), since these are characteristic of e-mail (Baron,1998; Colley & Todd, 2002). Therefore, during a basic spell-check (using thestandard emacs spell-checker; Stallman, 1994) distinction was made betweensuch intentional nonstandard spellings for communicative effect, and spellingerrors.

8

4.2 Corpus 2: Blogs

4.2.1 Participants

71 authors of personal weblogs (‘bloggers’) contributed to this corpus. Theywere recruited by e-mail sent by the experimenter, or by word-of-mouth rec-ommendation from fellow bloggers. To aid further participation bloggers wererequested to include a link to the experimentation site in their weblog. Norenumeration was given for taking part in this study.

As with the e-mail corpus, a sociobiographical questionnaire was administeredonline to participants, along with a personality questionnaire. This gave thefollowing information about the participants: 24 were male, 47 female; Meanage=28.4, insert relevant deoms SD=??; Level of education=???; Nationality=???;

etc=???. For this corpus, an online implimentation of the IPIP Five Fac-tor Personality Inventory (Buchanan, 2001) was used (rather than the EPQ-R), which provided the following information for the personality dimensions:Neuroticism (M=22.4, SD 6.3), Extraversion (M=30.5, SD 6.5), Openness(M=29.3, SD 4.7), Agreeableness (M=26.3, SD 3.7) and Conscientiousness(M=31.8, SD 6.1).

4.2.2 Materials and Procedure

The experiment was conducted on-line using HTML and Javascript forms forthe presentation and submission of materials. Again the web page had a simpledesign. After an introduction to the experiment and assurances of confiden-tiality, the sociobiographical and personality questionnaires were presented.In the final stage, participants were requested to submit one month’s worth ofprior weblog postings. The month was pre-specified so as to remove the possi-bility of an individual choosing what they considered their ‘best’ or ‘preferred’month, which may not be entirely representative.

4.2.3 Preparation of the corpus

Each participant provided one whole month of text from their blog. To elimi-nate undue influence of particularly verbose individuals, the size of each weblogfile was capped at the mean word count plus 2 standard deviations. The blogcorpus was then annotated using XML to capture high-level text features, suchas ‘Personal’ and ‘Commentary’. As in the e-mail corpus, stylistic editing oftext was kept to a minimum in order to retain as much individuality as possible(for example, non-standard words and informalism). A basic spell-check (using

9

the Sentry Spelling Checker Engine 2 ) then corrected errors and standardisedall spellings. Finally, text tagged as ‘Personal’ (i.e., written by the blog au-thors themselves) was extracted for analysis. This gave a corpus of 71 blogs,consisting of 1,854 individual posts (M=26.1, SD=20.3) and 411,843 words(M=5,801, SD=5,829). Word counts revealed that personal chunks accountedfor 86.8% (SD 15.5%) of all the author-written (i.e., non-quoted) words.

5 Experiment 1: CMC factor structure

Previously, Biber (1988) used factor analysis of language use to distinguishwriting styles across genres, but Pennebaker and King (1999) used it to studystructure within comparable texts. Using a large corpus of student essaysthat had been passed through the LIWC tool (Pennebaker & Francis, 1999),Pennebaker and King chose 15 variables to use in their analysis (see section5.1, below, for details of their selection criteria). In this experiment, we applyPennebaker & King’s factor analysis method to our e-mail and blog CMC data.In Experiment 2 (section 6) we go on to relate these factors to personality.

5.1 Analytic method

Scores on each of the LIWC’s categories were required for each author. Follow-ing Pennebaker and King, each subject’s scores are an average of the scoresfor each of their pieces of writing. For this, all 210 e-mail texts, and 1,854individual personal blog texts were analysed with LIWC, and mean scorescalculated for each subject.

In their study, Pennebaker and King (1999) outlined a number of considera-tions for selecting which of the 72 LIWC variables would be retained for factoranalysis. Firstly, variables were retained from earlier validation studies onlyif they showed reliability of .60 or greater. Secondly, categories were requirednot to overlap. For example, the prepositions category was not included, sincemany inclusive and exclusive words are prepositions. Thirdly, categories wereexcluded if they did not refer to meanings or features of specific words (for ex-ample, word count). Similarly, current concern words, with their topic-specificnature were also excluded. Finally, only variables that had a mean usage levelof at least 1% were included.

The 15 variables to be included by Pennebaker and King in their factor analysisare listed in Table 1 along with mean frequencies (and standard deviations

2 this should be in REF? Wintertree Software, Nepean, Ontario, Canada K2J 3N4

10

Dimension E-mail Blog WrittenWords > 6 letters 12.69 (1) 15.3 (1) 13.06 (2)Present tense 11.12 (2) 9.96 (2) 13.95 (1)First-person sing. 6.51 (3) 6.81 (4) 10.63 (3)Social processes 6.34 (4) 5.90 (5) 6.51 (4)Inclusive 6.32 (5) 5.77 (6) 5.95 (5)Articles 6.17 (6) 6.84 (3) 4.73 (6)Past tense 4.56 (7) 4.06 (7) 3.79 (8)Exclusive 3.55 (8) 3.61 (8) 4.21 (7)Positive emotions 3.10 (9) 2.86 (9) 3.38 (9)Tentative 2.62 (10) 2.43 (10) 2.84 (10)Discrepancy 2.18 (11) 1.94 (11) 2.84 (11)Negations 1.69 (12) 1.83 (12) 2.18 (13)Insight 1.65 (13) 1.71 (13) 2.47 (12)Negative emotions 0.99 (14) 1.66 (14) 1.80 (14)Causation 0.68 (15) 0.73 (15) 1.10 (15)

Table 1Means (and ranks) of 15 LIWC variable scores for three studies also include SD andalpha?Note: Ordered by rank of E-mail data. Written data reproduced from Pennebaker

& King (1999:1302), Table 2.

(CHECK)) for these variables within the e-mail and blog corpora, along withthose for written data reported by Pennebaker & King (1999). We now brieflydescribe the comparability of these data sets, before discussing the suitabilityof our data for entry into exploratory factor analysis.

There appears to be a basic underlying pattern: in fact, the rank of meansfrom the blog and e-mail corpora are almost identical, bar the slightly higherfrequency of Articles in blogs. Across all three studies, no variable is placedmore than three places higher or lower; the difference is mostly only one place.

Perhaps the biggest differences are as follows. The blog corpus contains agreater frequency of words of length greater than six letters, but fewer ex-amples of the present tense. The original written corpus contains more first-person singular pronouns, but few articles. Causation in the blog data, alongwith causation and negative emotions in the e-mail corpus, actually falls belowthe criterion of minimum 1% usage.

To ensure compatibility with the factor analysis of Pennebaker and King, thesame 15 variables were selected for both e-mail and blog corpora. However, thefourth of Pennebaker and King’s selection criteria is not met by: causation, ineither the e-mail and blog corpora (.73% and .68%, respectively); and negativeemotion, in the e-mail corpus (.99%). 3 For consistency of comparison betweenour e-mail and blog corpora, we take the more conservative measure and repeatthe analysis entering 13 variables, omitting the causation and negative emotioncategories.

3 Indeed in the blog data, although negative emotion words received 1.66% usage,this was the second lowest mean of the 15 variables, after causation.

11

Study No Var No Subjects Bartlett’s KMOWritten 15 841 2831 .633E-mail 15 105 333 .580Blog 15 71 340 .714E-mail 13 105 278 .600Blog 13 71 290 .714

Table 2Bartlett’s test of sphericity and KMO scores for the five samples

Note: Bartlett scores p < 0.001 - CAN THIS TABLE BE REMOVED AND INFODESCRIBED IN TEXT?

Exploratory factor analysis for the e-mail and blog data was carried out onthe means of each subject’s texts using both 15 and 13 variables. Diagnostictests (Bartlett’s test of sphericity, 4 and Kaiser-Meyer-Olkin’s measurementof sampling adequacy, KMO 5 ) reveal similar suitability to the previous studyof Pennebaker & King (Table 2).

5.2 Results

5.2.1 Analysis using 15 LIWC variables

Using all of Pennebaker and King’s 15 selected LIWC variables, we find thatfor e-mail and blog corpora five eigenvalues are over 1, and examining thescree plot indicates that a four factor solution best fits these data. Principal-components analysis was used to determine the four factors, and varimaxrotation was applied to aid interpretation. All 15 variables had communalitiesgreater than .37 for the e-mail data, and greater than .40 for the blog data.

Rotated factor loadings 6 for e-mail and blog data are shown in Tables 3 and4 respectively. Note that only loadings greater than .4 are shown, to furtheraid interpretation. 7

12

Factor 1: Factor 2: Factor 3: Factor 4:Immediacy Making Distinctions The Social Past Rationalization

Dictionary (19.4% var) (12.8% var) (11.5% var) (9.6% var)Present tense .788Words>6 lett −.6691st-psn-sng .587Insight .561Articles −.522 −.506Exclusive .697Negations .651Tentative .604Discrepancies .553Inclusive −.546 −.465Past tense .732Social .676Pos emotions .569Neg emotions .737Causation .707

Note. Only loadings of .40 or above are shown. N = 105.Table 3Rotated factor loadings for exploratory analysis of 15 LIWC variables - e-mail data

Factor 1: Factor 2: Factor 3: Factor 4:original (29.9% var) (10.0% var) (9.1% var) (8.5% var)rotated (20.6% var) (15.5% var) (12.3% var) (8.9% var)Exclusive .767Discrepancies .753 .421Tentative .731Causation .608Present tense .548 .486Negations .538 .532Insight .478 .436Words> 6 lett –.823Articles –.7371st-psn-sng .525Pos emotions .722Social .694Past tense .651Inclusive .513Neg emotions –.424

Table 4Rotated factor loadings for exploratory analysis of 15 LIWC variables - blog data

5.2.2 Analysis using 13 LIWC variables

As noted above, the second analysis omits two variables (causation and nega-tive emotion words). The scree plot for these data indicated that a three factor

4 Factor analysis requires that variables be independent from one another. If theobtained Bartlett’s score is significant, this has been shown.5 KMO indicates the proportion of variance in the variables which is common vari-ance, which might be caused by underlying factors. Scores range from 0 (very bad)to 1 (perfect), so the higher the score the better. MAYE WE SHOULD USE SOMEREF? HERE? - remove this, i think - unnecessary. Maybe promote into text???6 WE NEED TO FIX THIS - SCOTT CAN YOU DOUBLE CHECK WITH MEWHAT’S GOING ON HERE? - As in Pennebaker and King, and Gill, the factorloadings are rotated. However, in Gill (2004) the percentage of variance reported foreach factor is that for the unrotated factor solution. CHECK: Therefore, while inthe text unrotated variances are compared, both are reported in the results tables.7 Note that Pennebaker & King (1999) report loadings greater than .20.

13

Factor 1: Factor 2: Factor 3:Making Distinctions Immediacy The Social Past

Dictionary (21.9% var) (14.5% var) (11.8% var)Exclusive .697Negations .598Discrepancies .593Tentative .581Inclusive −.561 −.457Present tense .812Articles −.7101st-psn-sng .622Words>6 lett −.556InsightPast tense .755Social .658Pos emotions .577Note. Only loadings of .40 or above are shown. N = 105.Table 5Rotated factor loadings for exploratory analysis of 13 LIWC variables - email data

Factor 1: Factor 2: Factor 3:Making Distinctions Immediacy The Social Past

original (32.1% var) (11.1% var) (10.4.7% var)rotated (21.0% var) (20.2% var) (12.4% var)Exclusive .833Discrepancies .762Tentative .728Negations .674Articles –.8301st-psn-sng .667Present tense .488 .638Words>6 lett –.608Insight .444Pos emotions .670Social .622Inclusive .600Past tense .453

Table 6Rotated factor loadings for exploratory analysis of 13 LIWC variables - blog data

model would best fit the data (in both cases four factors that had eigenvaluesover 1). Once again, principal-component analysis extracted the factors, andvarimax rotation was used to enable interpretation. For the e-mail data, allvariables had communality greater than .35 with the exception of insight words(.26). All blog variables had communality greater than .33. The rotated factorloadings for e-mail and blog corpora are shown in Tables 5 and 6 respectively.

5.3 Discussion

The results of the 15 variable analysis applied to the written data of Pen-nebaker and King is reproduced in Table ??. To aid comparison, Table 7 showseach variable loading and direction across the factors for the e-mail, blog andwritten analyses. The most striking observation is that whereas Factors 1 and2 in the written data of Pennebaker and King and the e-mail data match, in

14

1e 2b 1w 2e 1b 2w 3e 3b 3w 4e 4b 4wExclusive + + +Discrepancies + + + + +Tentative + + +Causation + + +Present tense + + + + +Negations + + + +Insight + + + +Words>6 lett – – –Articles – – – –First-pers sing + + +Pos emotions + + –Social + + +Past tense + + +Inclusive – – – +Neg emotions + – –

Table 7Direction of loading found in the three studies using 15 LIWC variables

Note: e = E-mail; b = Blog; w = Written. Factors 1 and 2 of the blog data areswitched, compared with the e-mail and written data.

the blog data, the factors are reversed. In terms of variance accounted for,the first two factors account for 19.4% and 12.8% respectively in the e-mailcorpus, similar to the 22.4% and 10.3% reported by Pennebaker & King intheir written data. The equivalent factors in the blog data account for 10%and 29.9%.

Factor 1, termed ‘Immediacy’ by Pennebaker & King (eigenvalue = 3.35), isreplicated in the e-mail data (eigenvalue = 2.92), and both data show positiveloadings for discrepancy words, words relating to the present tense, and useof the first-person singular. There were also negative weightings for the use ofarticles, and words of length greater than 6. In addition to these five, we find inthe e-mail data, a loading for insight words. Factor 2 of blog data (eigenvalue= 1.50) has similar loadings for the five variables discussed, in addition to apositive loading for negations.

Variables loading onto Factor 2 (eigenvalue 1.55) of the written corpus (termed‘Making Distinctions’), are exclusive words, discrepancy words, negations andtentative words, all positively loaded, and inclusive words with a negativeloading. Factor 2 in the e-mail data (eigenvalue 1.91) has the same loadingswith the exception of discrepancy words which did not load on this factor.Factor 1 of the blog data (eigenvalue = 4.79) also contains positive loadings ofexclusive words, discrepancy words, negations and tentative words. However,also loading on this factor are words relating to insights, causation and thepresent tense, and we note the absence of the negative loading of inclusivewords.

The relationships between the remaining factors show broad similarities, butare less clear. Across all studies, Factor 3 has a positive loading for socialwords, but in the e-mail and blog corpora, there is a positive loading forexpressions of positive emotion, whereas for the written study it is negative.

15

1e 1b 2w 2e 2b 1w 3e 3b 3w 4wExclusive + + +Discrepancies + + + +Tentative + + +Causation +Negations + + +Articles – – –1st-psn-sng + + +Present tense + + + + +Words>6 lett – – –Insight + +Pos emotions + + –Social + + +Inclusive – – – +Past tense + + +Neg emotions –

Table 8Direction of loading of e-mail and blog data with 13, and in the original study using15, LIWC variables.

Note: e = E-mail; b = Blog; w = Written. Factors 1 and 2 of the written data(Pennebaker & King) are switched, compared with the e-mail and blog data.

Italicised variables are those excluded from the current study.

In the case of past tense words, the e-mail and written analyses show positiveloadings, whereas the blog corpus does not. Additionally, in writing there isa positive association with the present tense, while the e-mail data shows anegative association for inclusive words. The only other loading found for theblog data was positive, for insight words.

Insight words are the one category on which there is no agreement: they appearin Factors 1 and 3 of the blog data, Factor 1 of the e-mail data and on Factor 4of the written data. In terms of agreement on Factor 4, the e-mail and writtendata find a positive loading for causation, which the blog data places solely inFactor 1. The blog and written data agree upon a negative loading for negativeemotions, though the e-mail data demonstrates a positive loading. The blogdata is alone in Factor 4 showing loading for inclusive and past tense words.The e-mail data however, is alone in finding a loading for articles.

Though the factors differ slightly, there are obvious similarities, particularly inthe first two factors of each study, and to a lesser extent, the third. Likewise,the overall variance accounted for by the 4 factors was similar: 53.3% for thee-mail data, 57.4% for the blog data, and 51.1% for the written data.

Let us now turn to the results for the 13 variable analysis. Table 8 shows thedirection of the factor loadings for the 13 variables of e-mail and blog data,and the loadings for 15 variables of the written data (Pennebaker & King’sadditional variables excluded from 13 variable analysis are indicated in italics).Here we find that the ordering of Factors 1 and 2 switches in the 13 variablee-mail and blog analyses in comparison to that in the written study.

Factor 1 of both the e-mail (eigenvalue = 2.84) and blog (eigenvalue = 4.17)

16

data, includes positive loadings for negation and exclusive, discrepancy, andtentative words, with the e-mail data also showing negative loading for inclu-sive words, matching Factor 2 of the written data exactly. The blog data addsa positive loading for present tense words.

Factor 2 again is common to e-mail (eigenvalue = 1.89) and blog (eigenvalue= 1.44) data, with both showing loading of first-person singular and presenttense, and a negative loading of articles and words longer than 6 letters. As inthe previous analysis, insight loads onto Factor 2 in the blog data, but doesnot load strongly onto any e-mail data factors—although in the written data,Pennebaker & King found this loads onto their Factor 4. Also, discrepanciesload on the equivalent factor in the written data.

Factor 3 of e-mail (eigenvalue = 1.54) and blog (eigenvalue = 1.36) analysesshow loadings of positive emotions (although negative in the written study),social and past tense, although here e-mail data shows a negative loading ofinclusive words, whereas this is a positive loading for blog data. Also in thewritten data there are loadings of present tense words, but none for inclusivewords.

Examining the overall variance described by this analysis, we note that thethree factors account for the most variance in the blog data, at 53.6%, com-pared with the e-mail data at 48.2%, and finally (discounting their fourthfactor) Pennebaker and King’s written data with 42.5% (51.1% if their fourthfactor is included).

We therefore note that the three factors derived here more closely matchthe first three factors of Pennebaker and King than the analysis using 15variables. Their ‘Making Distinctions’ factor is identical to the first factorderived from the e-mail data, and with minor exceptions, the blog data also.For ‘Immediacy’ the only differences across the e-mail and blog data are thatinsight was not found to load, and in the case of blog data—as in the writtendata—discrepancies do load. The three factor solution (with three variables)also provides a much clearer replication of Pennebaker and King’s ‘The SocialPast’ factor. The past tense and social words are positively loaded across allstudies, while positive emotion loads positively in both e-mail and blog data,rather than negatively as in the study of Pennebaker & King. Present tensewords, which loaded on ‘The Social Past’ in the written study, are absent fromboth e-mail and blog analyses. By contrast, inclusive words did not load onthis factor in the written study, but do in the CMC data: positively for blogs,and negatively for e-mail.

17

6 Experiment 2: CMC factor structure and personality traits

Following Pennebaker & King’s procedure for investigating written language,this section studies correlations with personality. It uses the factors derivedin the previous section, and the related variables, to help explore languagedifferences within personality dimensions.

6.1 Analytic method

In the previous section, we noted that the factor analyses with causation andnegative emotion words removed most closely resembled the original factorsderived from written data (see Table 8). Therefore only the 13 variable solutionfor e-mail and blog data will be compared with the original study. This alsomeans that there will not be a comparison of the fourth factor in any detail,since it is the factor on which there is least agreement. 8 It should be notedthat due to vastly different population sizes (e-mail data’s 105 participants,and the 71 contributors to the blog corpus, contrast strongly with Pennebakerand King’s 841 subjects) it is difficult to make comparisons about correlationstrengths. So aside from the factors, we focus upon the correlations that reachsignificance, as well as more general differences in direction of relationship.

6.2 Results

The correlation results of the 13 variable factor analyses with personality canbe seen in Tables 9 and 10. We reproduce the findings of Pennebaker & King’soriginal study in table ??.

6.3 Discussion

We perform comparisons personality trait for each factor, beginning with Neu-roticism and Extraversion, since these were in all three studies, followed byOpenness, Agreeableness and Conscientiousness, since these allow comparisonwith the original written study. We finally discuss the findings for Psychoti-cism, the third trait of the EPQ-R model, after Extraversion and Neuroticism.

8 SCOTT - INCLUDE REF? TO PERSONAL CORRESPONDENCE WITH PEN-NEBAKER RE THIS.

18

EPQ-R DimensionNeuroticism Extraversion Psychoticism

Factor 1 −.11 −.03 .11Exclusive −.02 −.10 −.01Negations −.03 −.08 −.02Discrepancies .04 .09 .13Tentative −.14 .00 .13

– Inclusive .26** −.02 −.11

Factor 2 .12 −.08 −.11Present tense .14 −.10 −.06

– Articles −.02 .11 .121st-psn-sng .16 −.12 −.23*

– Words>6 lett .04 −.05 −.01

Factor 3 −.24* .11 .04Past tense −.19 .06 −.09Social −.05 .01 .02Pos emotions −.13 .15 .07

Note. N = 105. One variable is coded onto two factors: Inclusive is a part of Factors1 and 3 (Making Distinctions and The Social Past). ‘–’ is used to indicate a negativefactor loading. LIWC categories are ordered as they load onto their Factor. *p < .05.**p < .001, two tailed. Should we include and italicise duplicates? Also, 001, not 01???See next table!Table 9Correlation of LIWC factors (13 variables) with personality scores - blog data -e-mail data.

Five-factor DimensionN E O A C

Factor 1 .245* –.190 –.055 –.252* .019Exclusive .133 –.079 .143 –.189 –.061Discrepancies .339** –.251* –.118 –.290* –.035Tentative .140 –.144 –.107 –.198 .060Negations .163 .020 –.222 –.245* .098Present tense .159 .200 .009 –.092 .102

Factor 2 .003 .166 –.180 –.139 .068– Articles –.072 .031 .136 .255* –.054

1st-psn-sng –.017 .175 –.098 –.081 –.060Present tense .159 .200 .009 –.092 .102

– Words>6 lett .020 –.055 .290* .262* .034Insight –.111 .063 .106 .094 .075

Factor 3 –.079 .179 .295* .133 –.154Pos emotions –.043 .162 .127 .069 –.060Social –.035 .238* .195 .037 –.109Inclusive –.012 .015 .249* .094 –.091Past tense .011 –.116 –.028 –.125 –.157

Table 10Correlation of LIWC factors (13 variables) with personality scores - blog dataNote: n = 71, two tailed, *p<0.05, **p<0.01. Italics are used to indicate variables

loading on a second factor. ‘–’ is used to indicate a negative factor loading.

6.3.1 Neuroticism

In the blog data, we find that Factor 1 (‘Making Distinctions’) correlates signif-icantly with Neuroticism (p <.05), along with the constituent loading variable

19

discrepancies (p <.01), reflecting the general positive trend. In contrast, thee-mail data represents a generally weak negative trend between Factor 1 andNeuroticism, with only the constituent loading variable inclusive words show-ing a strong (positive) correlation (p <.01; the strongest relationship foundfor the e-mail data). No notable relationships are present between the relevantfactor ‘Making Distinctions’ in the written data and this personality trait.

Factor 2, or ‘Immediacy’ shows a generally positive relationship with Neuroti-cism in both e-mail and written studies (although this only reaches levels ofsignificance in the latter study, for the factor itself and first person singularwords, and—negatively—articles). No notable relationships exist in the blogdata.

For Factor 3 (‘The Social Past’), there is a significant negative correlationwith Neuroticism in the e-mail data (p <.05), with other variables loading onthis factor also generally showing a negative trend; the same is true for theblog data, however no relationships reach significance. For the written data,although there is little overall relationship with Neuroticism, positive emotionwords which load onto the factor show a strong negative relationship, similarto that of the e-mail data (although as noted above, not significant). Althoughnot included in the present analysis, we additionally note the strong positivecorrelation with negative emotion words in the written data (loading on Factor4, ‘Rationalization’).

6.3.2 Extraversion

With the exception of discrepancies in the blog corpus (p <.05), Factor 1(‘Making Distinctions’) and loading variables do not show more than a nega-tive trend in relationship to Extraversion (with this stronger for the blog data).In contrast, the written data of Pennebaker & King shows a much strongersignificant negative correlation (with the exception of inclusive words, whichcorrelate positively).

Factor 2 (‘Immediacy’) showed very few consistent or significant relationshipswith Extraversion across the three studies (blogs being positively related, writ-ing slightly positive, and e-mails negative), with the exception being the vari-able articles in the written study, which showed a negative relationship.

Although for Factor 3 (‘The Social Past’) both e-mail and blog data show ageneral positive trend, this is not the case with the written study, althoughsocial and positive emotion constituent variables both show a strong relation-ship with Extraversion. In the case of the blog copus, only social words relatedpositively (p <.05).

20

6.3.3 Openness

Turning now to the remaining five-factor personality variables, we will solelycompare the findings of the blog and written data. In both of these studiesFactor 1, or ‘Making Distinctions’, is not significantly correlated with Open-ness, although in the written data both exclusive and tentativity variablescorrelate positively.

For Factor 2 (‘Immediacy’), the blog data shows a negative trend with Open-ness; however, the most notable relationship is positive, with the words of morethan 6 letters variable (p <.05). In contrast, this factor shows the strongest ofall relationships in the written data of Pennebaker & King, which is negativelyrelated to Openness, along with first-person singular and present tense words,with articles and words of more than 6 letters loading similarly positively inthe positive direction.

Both studies find a significant positive correlation with Factor 3, ‘The So-cial Past’ and Openness, and in fact for the blog data, this represents thestrongest factor correlation (r=.295; p <.05). The blog study also shows a sig-nificant positive relatioinship between Openness and the inclusiveness variable(p <.05).

6.3.4 Agreeableness

Factor 1, (‘Making Distinctions’) and Agreeableness demonstrate another strongrelationship in the blog data, which in this case is negative (p <.05), with con-stituent variables discrepancies and negations also similarly showing a negativerelationship (p <.05). No notable relationships are present in the written data.

Factor 2 (‘Immediacy’) in the written data shows a positive relationship withAgreeableness, along with constituent variable first-person singular, and anegative relation to articles. Although the factor does not correlate in theblog study, it conversely shows a strong trend towards a negative relationshipwith this trait, and appears to demonstrate the largest area of disagreementbetween the two studies. The constituent variables articles and words of morethan 6 letters both correlate positively and significantly with Agreeableness(p <.05).

Factor 3 (‘The Social Past’) shows a very small positive relationship withAgreeableness in the written data, whereas the blog data apparently shows alarger (but still non-significant) correlation. Pennebaker & King do, however,note a positive relationship with positive emotion words.

21

6.3.5 Conscientiousness

Although the blog data does not appear to demonstrate much interactionbetween Factor 1 (‘Making Distinctions’) and Conscientiousness, in the writtendata, there is a strong negative relationship both at the factor level, and alsowith the constituent variable negations.

Neither Factor 2 (‘Immediacy’) of the blog or written corpus bears muchrelationship with Conscientiousness, with the exception of the variable dis-crepancies being negatively related in the written corpus.

Factor 3 (‘The Social Past’) shows a weak non-significant negative relationshipwith Conscientiousness in both blog and written data; in the case of the latterthe constituent variable positive emotion words correlates positively with thistrait.

6.3.6 Psychoticism

In order to relate the findings for Psychoticism to those of the five-factormodel used in the blog corpus, and also by Pennebaker and King (1999), wetake Psychoticism to be inversely related to Agreeableness and Conscientious-ness. For Factor 1 (‘Making Distinctions’), we find that Psychoticism showsa trend towards a positive relationship with this factor, which is consistent(when inverted) with the negative relationships found in the written dataacross Agreeableness and Conscientiousness, and in the case of the blog data,consistent with Agreeableness.

For Factor 2 (‘Immediacy’), Psychoticism shows a non-significant negativerelationship with this factor, and also relates significantly negatively to thefirst-person singular variable (< .05), with this finding consistent (when in-verted) with the significant positive relationship found with Agreeableness inthe written data. However, we note that this is less consistent with the non-significant negative relationship found in the blog data.

Factor 3 (‘The Social Past’) of the e-mail data shows little relationship toPsychoticism, and although we cannot infer too much from this, we note thatthis is generally consistent with the blog and written data studies.

7 General discussion

In this study, we addressed two sets of research questions (Section 3). We nowaddress these in turn, drawing together the material from each experiment.

22

7.1 E-mail versus Blog factors; CMC versus Non-CMC

The primary aim of this study has been to investigate how the different CMCgenres of e-mail and blog relate, and to contrast them with non-CMC datafrom a previous study. What are very apparent, firstly, are the broad sim-ilarities in factor structure between these three language varieties, and thisis especially notable with the more conservative three-factor CMC solution.Although there has been much speculation about the divergence in languagevarieties brought about by the advent of CMC, it should not be surprisingthat these factors, originally derived from writing, have been replicated here.Rather, it is a tribute to Pennebaker & King, the authors of the original study,for their effective selection of LIWC variables which have indeed demonstratedtopic independence and apparent robustness across different communicationgenres. We do note, however, that consistent with Pennebaker’s own findings,the first two factors are replicated more closely than the third, and fourthfactors (PENNEBAKER REF? - CORRESPONDENCE).

This is not to say that there are no differences between these genres, as wepreviously note in our discussion of the results (Sections 5.3 and 6.3). Ex-amination of directions of variable loadings upon the factors across the threestudies shows some minor variation: From this, it is interesting to note thate-mail appears to occupy the middle ground between the blog and writtengenres. Referring to our three-factor solutions, there is no case where variableloadings are shared by blog and written language factors, but not by e-mail—yet there are cases where e-mail and one of the other genres together differfrom the other genre. Here we note that e-mail and written language are moresimilar in their use of inclusive words (like and and with), which loads on the‘Making Distinctions’ factor, but that e-mail and blog language oppose thewritten finding for ‘The Social Past’ factor because they have positive, ratherthan negative loadings for positive emotion words.

There are also distinguishing features in the way variables load onto the fac-tors: Characteristic features of weblog language are the positive loading ofpresent tense words on the ‘Making Distinctions’ factor, and insight onto the‘Immediacy’ factor; conversely, written language is characterised by the posi-tive loadings of discrepancies on the ‘Immediacy’ factor and by present tenseon ‘The Social Past’ factor. All three genres are apparently distinguished byinclusive words in relation to ‘The Social Past’ with blogs showing positiveloading, e-mails showing negative loading, and written language an absence ofloading for this factor.

DO WE NEED MORE DISCUSSION OF WHY THESE FEATURES HAPPEN -CHARACTERISTICE OF CMC - COMPARE WITH PREVIOUS LIT?

23

In summary, we therefore observe there are broad similarities between thesegenres with differences in factor structure which distinguish between all three,although in terms of factor structure, e-mail appears intermediate between theblog and written texts. Like Pennebaker & King, we selected the variables inthe current study in order to ensure comparability across genres. However, wealso note that in order to identify linguistic features which distinguish thesedifferent genres, future research may focus on LIWC variables which wereexcluded from the original study for their lack of consistency across genres.An alternative approach which would highlight characteristic features is thedata-driven technique which has already successfully been applied to e-mailand blog data (???).

7.2 Personality and factors in CMC; and five versus three factors

The second aim of this study was to investigate the features of personal-ity across the three genres, in particularly in terms of CMC in comparisonto non-CMC language. Primarily we focus on the traits of Extraversion andNeuroticism, since these are common to the three studies.

Taking the findings from e-mail and blogs together, we note that Neuroticismshows a positive relationship with ‘Making Distinctions’, and a negative re-lationship with ‘The Social Past’, whereas in the non-CMC written data, itcorrelates positively with the ‘Immediacy’ factor. For Extraversion, althoughthe CMC data does not show significant relationship to any of the factors itdoes show a tendency in the same direction as the negative relationship with‘Making Distinctions’ found in the non-CMC data.

Turning now to the other factors, we note that Openness reveals agreementbetween blog CMC and non-CMC data in terms of a positive relationship with‘The Social Past’, with the most significant non-CMC (negative) relationshipwith ‘Immediacy’ also mirrored (non-significantly) in the blog CMC data. ForAgreeableness, the negative relationship with ‘Making Distinctions’ in blogCMC data contrasts with the positive relationship with ‘Immediacy’ in non-CMC data; the negative relationship between Conscientiousness and ‘MakingDistinctions’ in the non-CMC study is not found here.

Although we expected that additional stronger relationships would emergewith the inclusion of Psychoticism, this was not the case, with no factors reach-ing significance in correlation. Psychoticism was generally compatible (wheninverted) with the other studies for ‘Making Distinctions’ and ‘Immediacy’,and like the other studies showed little relationship to ‘The Social Past’.

We therefore note that although there are general consistencies between CMCand non-CMC data when they are related to personality measures, conclusive

24

comparison is difficult due to fewer of the relationships reaching significancein the CMC data, apparently as a result of having many fewer participants inthese studies (105 and 71 for the e-mail and blog data, respectively, comparedto 841 in Pennebaker & King’s written study). Of the significant relation-ships observed, we note the different behaviour in the CMC data leading toa positive relationship between Neuroticism and ‘Making Distinctions’ andnegative one between it and ‘The Social Past’, compared with non-CMC’srelationship with ‘Immediacy’. For Agreeableness, CMC relates negatively to‘Making Distinctions’, rather than positively to ‘Immediacy’. Check pos/negin e/b/w cases!

We can speculate as to the reasons for these differences. First, it is ratherhard to see why ‘The Social Past’ should vary more in CMC, but perhaps thedifferences are associated with differing role which positive emotions play inthis factor in CMC, as opposed to non-CMC. Secondly, however, ‘Immediacy’seems to vary less in CMC, and this may be attributable to an audience de-sign effect, particularly with blogs. Because both e-mail and blogs are writtenfor potentially remote audiences, all authors rely less on linguistic devices in-volving immediacy. With less use of them overall, there is less variation to befound amongst individual authors. Secondly, it seems that ‘Making Distinc-tions’ varies more in CMC, and the reason may be related to reduced socialpresence. According to (?), bloggers often state that they write in order to‘vent’; equally, e-mail is generally held to encourage the freer expression ofnegative opinions. If this is so, then CMC authors may be more likely to in-dulge in fault-finding writing. With more fault-finding going on, there will bemore variation to be found among individual authors. It seems plausible thatthe more anxious and less agreeable authors will be those most likely to drawattention to faults and discepancies.

8 Conclusion

In this study we have replicated a previously derived factor structure for lan-guage in written (non-CMC) environments in two types of CMC: e-mail andblogs. The factors are largely similar, this being a merit of the original analyt-ical design. However we note the value of particular features for distinguishingbetween CMC and non-CMC communication. Furthermore, at a more detailedlevel, we observe that the patterning of such variation across genres indicatesthat e-mail shares the most similarity with both blog and written language,and as such mediates in style between the two, with blog and non-CMC writtenlanguage more readily distinguished. We note the value of selecting differentvariables, or data-driven analyses, to further highlight distinguishing charac-teristics of these genres.

25

When factors derived from CMC and non-CMC communication are corre-lated with personality measures, we find different patterns of language be-haviour for Neuroticism and Agreeableness in CMC versus non-CMC envi-ronments. Replacing Openness, Agreeableness and Conscientiousness of thefive-factor model with Psychoicism of the EPQ-R does not uncover significantnew behaviour, or uncover stronger relationships, although findings are gen-erally consistent with the five-factor approach. We do however note that thesmaller number of participants in the CMC studies, compared with those inthe previous non-CMC written study, results in fewer relationships reachingsignificance.

References

Balter, O. (1998). Electronic Mail in a Working Context . Ph.D. thesis, RoyalInstitute of Technology, Stockholm.

Baron, N. (1998). Letters by phone or speech by other means: the linguisticsof email. Language and Communication, 18, 133–170.

Baron, N. (2001). Commas and canaries: the role of punctuation in speechand writing. Language Sciences , 23, 15–67.

Bell, A. (1984). Language as audience design. Language in Society , 13, 145–204.

Biber, D. (1988). Variation across speech and writing . Cambridge UniversityPress, Cambridge.

Biber, D. (1995). Dimensions of Register Variation. Cambridge UniversityPress, Cambridge.

Bloch, J. (2002). Student/teacher interaction via email: the social context ofinternet discourse. Journal of Second Language Writing , 11, 117–134.

Block, J. (1995). A contrarian view of the five-factor approach to personalitydescription. Psychological Bulletin, 117, 187–215.

Brown, G. and Yule, G. (1983). Discourse analysis . Cambridge UniversityPress, Cambridge.

Buchanan, T. (2001). Online implementation of an IPIP five factor personalityinventory. Technical report, University of Westminster, http://www.wmin.ac.uk/~buchant/wwwffi/introduction.html.

Buss, A. and Finn, S. (1987). Classification of personality traits. Personalityand Social Psychology , 52, 432–444.

Carment, D. W., Miles, C. G., and Cervin, V. B. (1965). Persuasiveness andpersuasibility as related to Intelligence and Extraversion. British Journalof Social and Clinical Psychology , 4, 1–7.

Colley, A. and Todd, Z. (2002). Gender-linked differences in the style andcontent of e-mails to friends. Journal of Language and Social Psychology ,21, 380–392.

Collot, M. and Belmore, N. (1996). Electronic language: A new variety of En-

26

glish. In S. Herring, editor, Computer-mediated communication: Linguistic,social and cross-cultural perspectives , pages 13–28. Benjamins, Amsterdam.

Costa, P. and McCrae, R. R. (1992). NEO PI-R Professional Manual . Psy-chological Assessment Resources, Odessa, Florida.

Deary, I. and Matthews, G. (1993). Personality traits are alive and well. ThePsychologist , 6, 299–311.

Dewaele, J.-M. (2001). Interpreting the maxim of quantity: interindividualand situational variation in discourse styles of non-native speakers. InE. Nemeth, editor, Cognition in Language Use: Selected Papers from the7th International Pragmatics Conference, volume 1, pages 85–99. Interna-tional Pragmatics Association, Antwerp.

Dewaele, J.-M. and Furnham, A. (1999). Extraversion: The unloved variablein applied linguistic research. Language Learning , 49, 509–544.

Digman, J. (1990). Personality structure: Emergence of the five-factor model.Annual Review of Psychology , 41, 417–440.

Eysenck, H. (1970). The Biological Basis of Personality . Thomas, Springfield,IL.

Eysenck, H. (1993). From DNA to social behaviour: conditions for a paradigmof personality research. In J. Hettema and I. Deary, editors, Foundations ofpersonality . Kluwer, Dortrect.

Eysenck, H. and Eysenck, S. B. G. (1991). The Eysenck PersonalityQuestionnaire-Revised . Hodder and Stoughton, Sevenoaks.

Eysenck, S., Eysenck, H., and Barrett, P. (1985). A revised version of thepsychoticism scale. Personality and Individual Differences , 6, 21–29.

Funder, D. (2001). Personality. Annual Review of Psychology , 52, 197–221.Furnham, A. (1990). Language and personality. In H. Giles and W. Robinson,

editors, Handbook of Language and Social Psychology , pages 73–95. Wiley,Chichester.

Gifford, R. and Hine, D. W. (1994). The role of verbal behaviour in theencoding and decoding of interpersonal dispositions. Journal of Researchin Personality , 28, 115–132.

Gill, A. and Oberlander, J. (2002). Taking care of the linguistic features ofextraversion. In Proceedings of the 24th Annual Conference of the CognitiveScience Society , pages 363–368.

Goldberg, L. (1993). The structure of phenotypic personality traits. AmericanPsychologist , 48, 26–34.

Graybeal, A., Sexton, J., and Pennebaker, J. (2002). The role of story-making in disclosure writing: The psychometrics of narrative. Psychologyand Health, 17, 571–581.

Hancock, J. and Dunham, P. (2001). Impression formation in computer-mediated communication. Communication Research, 28, 325–347.

Kilgarriff, A. (2001). Comparing corpora. International Journal of CorpusLinguistics , 6, 231–245.

Kline, P. (1993). The Handbook of Psychological Testing . Routledge, London.Lippa, R. and Dietz, K. (2000). The relations of gender, personality, and

27

intelligence to judges’ accuracy in judging strangers’ personality from briefvideo segments. Journal of Nonverbal Behavior , 24, 25–43.

McCrae, R. and Costa, P. (1987). Validation of the five-factor model of per-sonality across instruments and observers. Journal of Personality and SocialPsychology , 52, 81–90.

McCrae, R. and Costa, P. (1997). Personality trait structure as a humanuniversal. American Psychologist , 52, 509–516.

Mehl, M. and Pennebaker, J. (2003). The sounds of social life: A psychometricanalysis of student’s daily social interactions. Journal of Personality andSocial Psychology , 84, 857–870.

Newman, M., Pennebaker, J., Berry, D., and Richards, J. (2003). Lying words:Predicting deception from linguistic styles. Personality and Social Psychol-ogy Bulletin, 29, 665–675.

Oxman, T., Rosenberg, S., Schnurr, P., and Tucker, G. (1988). Diagnosticclassification through content analysis of patients’ speech. American Journalof Psychiatry , 145, 464–468.

Panteli, N. (2002). Richness, power cues and email text. Information &Management , 40, 75–86.

Pennebaker, J., Mehl, M., and Niederhoffer, K. (2003). Psychological aspectsof natural language use: Our words, our selves. Annual Review of Psychol-ogy , 54, 547–577.

Pennebaker, J. W. (1997). Writing about emotional experiences as a thera-peutic process. Psychological Science, 8, 162–166.

Pennebaker, J. W. and Francis, M. (1999). Linguistic Inquiry and Word Count(LIWC). Lawrence Erlbaum Associates, Mahwah, NJ.

Pennebaker, J. W. and King, L. (1999). Linguistic styles: Language use asan individual difference. Journal of Personality and Social Psychology , 77,1296–1312.

Pennebaker, J. W., Mayne, T., and Francis, M. (1997). Linguistic predictorsof adaptive bereavement. Journal of Personality and Social Psychology , 72,863–871.

Pennebaker, J. W., Francis, M. E., and Booth, R. J. (2001). Linguistic Inquiryand Word Count (LIWC2001). Lawrence Erlbaum Associates, Mahwah, NJ.

Pytlik Zillig, L., Hemmenover, S., and Dienstbier, R. (2002). What do weassess when we assess a Big 5 trait? a content analysis of the affective,behavioural, and cognitive processes represented in Big 5 personality inven-tories. Personality and Social Psychology Bulletin, 28, 847–858.

Rayson, P. (2003). Matrix: A statistical method and software tool for linguisticanalysis through corpus comparison. Ph.D. thesis, Lancaster University.

Scherer, K. (1979). Personality markers in speech. In K. R. Scherer andH. Giles, editors, Social Markers in Speech, pages 147–209. Cambridge Uni-versity Press, Cambridge.

Smith, C. (1992). Introduction: inferences from verbal material. In C. Smith,editor, Motivation and personality: Handbook of thematic content analysis ,pages 1–17. Cambridge University Press, Cambridge.

28

Thorne, A. (1987). The press of personality: A study of conversations betweenintroverts and extraverts. Journal of Personality and Social Psychology , 53,718–726.

Werry, C. (1996). Linguistic and interactional features of internet relay chat.In S. Herring, editor, Computer-mediated communication: Linguistic, socialand cross-cultural perspectives , pages 47–63. Benjamins, Amsterdam.

Wiggins, J. and Pincus, A. (1992). Personality: Structure and assessment.Annual Review of Psychology , 43, 473–504.

Yellen, R., Winniford, M., and Sanford, C. (1995). Extraversion and introver-sion in electronically-supported meetings. Information & Management , 28,63–74.

29