gender in shakespeare using feature selection via text mining

Gender in Shakespeare from Lexical, Syntactic and Lemma Features, DHCS 2006 Sobhan Hota Shlomo Argamon Rebecca Chung ILLINOIS INSTITUTE OF TECHNOLOGY Department of Computer Science Linguistic Cognition Laboratory

Upload: sobhan-hota

Post on 08-Jul-2015




0 download


Page 1: Gender in Shakespeare using feature selection via Text Mining

Gender in Shakespeare from Lexical, Syntactic and Lemma Features, DHCS 2006

Sobhan Hota

Shlomo Argamon

Rebecca Chung


Linguistic Cognition Laboratory

Page 2: Gender in Shakespeare using feature selection via Text Mining


Do playwrights follow certain style on presentingtheir ‘Character Gender’ in their speeches?

Research Questions: Can the gender of Shakespeare's characters be determined from

their word use? Can we glean any features, which discriminate the gender character? Are these features similar or different to those obtained in previous

research on gender classification? Is this research going to change the understanding in Shakespeare’s

Literature or research on plays?

Page 3: Gender in Shakespeare using feature selection via Text Mining


Recent works on Gender

Corpus Selection and Meta Data


Feature Sets, Vector Calculation, Classifier

Results, Feature Analysis

Conclusions, Future Works

Page 4: Gender in Shakespeare using feature selection via Text Mining

Recent works on Gender

Research in Gender Classification based on text

Performing Gender : Automatic Stylistic Analysis of Shakespeare’s Characters – 74% (Hota et. al. : DH 2006)

Categorizing Written Text by Author Gender – 80%(Shlomo Argamon et. al.: LLC 2004)

Gender Preferential Text Mining of E-mail Discourse – 70%(Olivier de Vel et. al.: ACSAC 2002)

A Quantitative Analysis of Lexical Differences Between Genders in Telephone Conversations – 92%(Constantinos Boulis, Mari Ostendorf: ACL 2005)

Effects of Age and Gender on Blogging – 80%(Shlomo Argamon st. al.: AAAI 2006)

Page 5: Gender in Shakespeare using feature selection via Text Mining

Which side is Female?

I know thou canst; and therefore

see thou do it I am possess’d with

an adulterate blot; My blood is

mingled with the crime of lust: For

if we too be one and thou play

false, I do digest the poison of the

flesh, That never object pleasing in

thine eye, That never touch well

welcome to thy hand, That never

meat sweet-savor’s in thy taste,

When, being not at your lodging to be found, The senate hath sent about three several guests to search you out. The riches of the ship is come on shore!Ye men of Cyprus, let her have your knees. I have promised to study three years with the duke. Impossible. I am ill at reckoning; it fitteth the spirit of a tapster. I confess both: they are both the varnish of a complete man.

Page 6: Gender in Shakespeare using feature selection via Text Mining

Corpus Selection

35 plays from Nameless Shakespeare

Corpus is tagged with gender of the literary character

Corpus is tagged with PoS, Lemma information for each word

Concatenated all speeches of a particular character in a play

Female Characters with 200 or more words from each play were considered

Then we chose same number of male characters as female characters from a play, restricted to those not longer than the longest female character from that particular play.

A total of 101 Males and 101 Females with equal number of Males and Females from each play were collected

Page 7: Gender in Shakespeare using feature selection via Text Mining

Meta Data

Data of a Character

Name of the Character: HENRYVSpeech Length: 1024Gender: Male

Page 8: Gender in Shakespeare using feature selection via Text Mining



Corpus of Texts


Processing and









Page 9: Gender in Shakespeare using feature selection via Text Mining

Feature Sets

Feature Sets PoS Lexical LemmaFunction Words - 645 -

Bag of Words - 2426 -

500 Most Frequent - 500 500

Uni Grams 31 2426 2001

Bi Grams 2259 2620 2860

Tri Grams 1634 356 571

Uni plus Bi Grams 2290 5046 4861

Bi plus Tri Grams 3893 2976 3431

Uni plus Tri Grams 1665 2782 2572

Uni plus Bi plus Tri Grams 3924 5402 5432

Page 10: Gender in Shakespeare using feature selection via Text Mining

Vector Calculation

Ratio between count of a feature to the total count of features in a feature set

This calculation for a feature set as a collection is termed as vector, which represents the document of a character

Collection of vectors is given as an input to a machine learning algorithm for classification

Page 11: Gender in Shakespeare using feature selection via Text Mining

Classifier - SMO

Classifier: SMO – Sequential Minimal Optimization

Learns a linear decision rule which is a hyper plane, separates character gender


Testing Option: 10 folds Cross Validation (CV)

CV is used for Estimating True Error

Page 12: Gender in Shakespeare using feature selection via Text Mining














Uni BiTri








PoS Features

Gender Classification - PoS


Page 13: Gender in Shakespeare using feature selection via Text Mining







BoWs Bi












Lexical Features

Gender Classification - Lexical


Page 14: Gender in Shakespeare using feature selection via Text Mining









Uni BiTri








Lemma Features

Gender Classification - Lemma


Page 15: Gender in Shakespeare using feature selection via Text Mining

PoS Features

Male Female

All at, prp, fo, cjs, np, pnx, pnq, vm, pcl, ge, dt, chr, n, ajp, pnr, v

itj, neg, aj, pnp, cjq, av, vp, cjc, dtq, avq, nu, it, pni, la, fr

Early at, fo, vm, prp, n, cjs, np, pnr, pnq, nu, la, chr, v, ajp, it, pcl, pnp, dt

aj, av, neg, itj, dtq, cjq, vp, pni, fr, avq, cjc, pnx

Late pnx, at, prp, dt, ge, pcl, pnq, av, chr, ajp, cjs, dtq, np, n

itj, pnp, neg, pni, cjq, fr, avq, it, cjc, la, vm, nu, aj, vp, v, pnr

Page 16: Gender in Shakespeare using feature selection via Text Mining

Lexical - FWs Features

Male Female

All being, well, already, thank, allow, many, doing, whom, there, three, the

never, such, little, yet, only, wish, hence, take, comes, you

Early three, certain, ta’en, here, whom, whence, first, able, what’s, among

never, changes, self, take, help, yet, ask, almost, merely, hers

Late the, already, of, immediate, doing, we, toward, some, well, very

such, among, gone, am, selves, you, woo’t, here’s, come, might

Page 17: Gender in Shakespeare using feature selection via Text Mining

Lexical - BoWs Features

Male Female

All whom, three, her, fellow, degree, lying, knit, beat, avoid, to

alas, o, gone, grieve, husband, heart, he, prithee, knife, sick

Early cat, her, three, whence, whom, knit, wherein, among, marry, wrought

heart, husband, alas, knife, lack, never, grieve, quality, o, glory

Late the, of, loss, description, lying, virtue, them, whipped, pen, begin

gone, alas, bestow, pray, little, am, such, prithee, o, kiss

Page 18: Gender in Shakespeare using feature selection via Text Mining

Lemma – Uni gram Features

Male Female

All begin, three, alight, solemn, to, noble, who, she, beat, savage

alas, o, husband, prithee, mother, court, he, you, sick, merry

Early three, whence, alight, savage, cat, wherein, ship, who, knit, stay

husband, heart, alas, never, merry, catch, compare, full, wicked, rain

Late the, of, begin, loss, beat, embrace, motion, fresh, to, description

sharp, alas, such, hie, messenger, prithee, I, o, false, dear

Page 19: Gender in Shakespeare using feature selection via Text Mining

Left (Female) – Right (Male)

I know thou canst; and therefore

see thou do it I am possess’d with

an adulterate blot; My blood is

mingled with the crime of lust: For

if we too be one and thou play

false, I do digest the poison of the

flesh, That never object pleasing

in thine eye, That never touch

well welcome to thy hand, That

never meat sweet-savor’s in thy


When, being not at your lodgingto be found, The senate hath sent about three several guests to search you out. The riches of the ship is come on shore!Ye men of Cyprus, let her have your knees. I have promised to study three years with the duke. Impossible. I am ill at reckoning; it fitteth the spirit of a tapster. I confess both: they are both the varnish of a complete man.

Page 20: Gender in Shakespeare using feature selection via Text Mining

Lexical - Bi gram Features

Male Female

All to the, there be, i say, the great, for this, a most, on the, go you, if there, love her

me how, is such, i hate, know i, to bring, know how, heart as, thing i, your heart, now i

Early to the, for this, on the, how much, to find, i say, my soul, beseech you, go you, for the

fare you, my husband, more in, you make, to feed, love you, at his, was born, me how, is such

Late art a, prove a , at this, a most, to the, whom I , be no, but a, i came, in it

his bed, i prithee, that i, me how, art not, use of, thou wast, i hate, hie thee, be gone

Page 21: Gender in Shakespeare using feature selection via Text Mining

Lexical - Tri gram Features

Male Female

All as much as, is to be, the name of, is a good, away with him, do you know, not in the, you are my, is but a, it was not

i warrant you, what is it, he is not, i should be, i am a, you for your, is it not, by your leave, for me to, i am your

Early i have seen, you are my, i beseech you, my lord of, do beseech you, three thousand ducats, i thank you, is to be, with me to, i think i

when they are, i am so, and yet i, i should be, in such a, i am glad, fare you well, thou shalt be, i warrant you, for such a

Late the name of, of all the, this is a, thou art a, to the king, there is no, with all my, is to be, but i am, but it is

i thank you, i am your, do beseech you, i do beseech, i care not, it is no, of the house, if i do, it be so, how say you

Page 22: Gender in Shakespeare using feature selection via Text Mining

Lemma - Tri gram Features

Male Female

All but i be, be a very, be a ass, be to be, have no more, i be he, i say to, i have lose, the manner of, i go to

if he have, thank you for, i see you, for i to, one of you, i know i, who be that, be he not, say i be, i be you

Early i go to, but i be, i be in, i beseech you, be a ass, i lord of, i have see, when i have, when i be, i to the

when they be, be not to, i love you, be not yet, do not know, who be that, fare you well, i can tell, and yet i, i can speak

Late there be no, but it be, this be a, the name of, be not yet, of all the, thou be a, with all i, but i be

do beseech you, i do beseech, you tell i, it be so, for i to, get thou go, one of you, will you be, i care not, there be a

Page 23: Gender in Shakespeare using feature selection via Text Mining


Shakespeare results are similar to the results obtained in ‘GenderAuthor Discrimination in Fiction/Nonfiction’ - Argamon et. al.2004

Male – Author Female - Author

Articles, Determiners (Ex: a, the, that) Negation (Ex: not)

Numbers (Ex: one) Pronouns, Conjunctions (Ex: she, and)

Prepositions Certain Prepositions (Ex: for, with)

Page 24: Gender in Shakespeare using feature selection via Text Mining

Literary Interpretation

Blank Verse and Prose can lead to gender discrimination?

Reading literary scholar’s minds in elaborated methods of semantic analysis (New Criticism, Structuralism, Post-structuralism)

Page 25: Gender in Shakespeare using feature selection via Text Mining


Style plays its role in discriminating literary character’s gender

Tri grams features are computationally effective and informational

Difference between early and late Shakespeare exists, in classifying gender of a literary character

This work extends the previous research on classifying gender of an author from modern texts on BNC Corpus (Argamon et al. 2004)

Page 26: Gender in Shakespeare using feature selection via Text Mining

Future Work

Clear methodology which gives meaningful results in differentiating character gender

Understanding other playwright’s work on gender of literary characters

Page 27: Gender in Shakespeare using feature selection via Text Mining


• Performing Gender: Automatic Stylistic Analysis of Shakespeare's Characters

Sobhan Raj Hota, Shlomo Argamon, Moshe Koppel, Iris Zigdon – ACH06

• Gender in Shakespeare: Automatic Stylistic Analysis of Shakespeare's Characters - Sobhan Raj Hota, Shlomo Argamon - MCLC 2006

• Stylistic Text Classification using Functional Lexical Features

Shlomo Argamon, Casey Whitelaw, Paul Chase, Sobhan Raj Hota, Sushant Dhawle, Navendu Garg, Shlomo Levitan - JASIST05