knock knock jokes

123
UNIVERSITY OF CINCINNATI Date:___________________ I, _________________________________________________________, hereby submit this work as part of the requirements for the degree of: in: It is entitled: This work and its defense approved by: Chair: _______________________________ _______________________________ _______________________________ _______________________________ _______________________________

Upload: andrzej-zychla

Post on 20-Feb-2015

928 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Knock Knock Jokes

UNIVERSITY OF CINCINNATI Date:___________________

I, _________________________________________________________, hereby submit this work as part of the requirements for the degree of:

in:

It is entitled:

This work and its defense approved by:

Chair: _______________________________ _______________________________ _______________________________ _______________________________ _______________________________

Page 2: Knock Knock Jokes

Computational Recognition Of Humor In A Focused Domain

A thesis submitted to the

Division of Research and Advanced Studies of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

MASTER OF SCIENCE

in the Department of Electrical and Computer Engineering and Computer Science

of the College of Engineering

2004

by

Julia Taylor

B.S., University of Cincinnati, 1999 B.A., University of Cincinnati, 1999

Committee Chair: Dr. Lawrence Mazlack

Page 3: Knock Knock Jokes

Abstract. With advancing developments of artificial intelligence, humor researchers

have begun to look at approaches for computational humor. Although there appears to be

no complete computational model for recognizing verbally expressed humor, it may be

possible to recognize jokes based on statistical language recognition techniques. This is

an investigation into computational humor recognition. It considers a restricted set of all

possible jokes that have wordplay as a component and examines the limited domain of

“Knock Knock” jokes. The method uses Raskin's Theory of Humor for its theoretical

foundation. The original phrase and the complimentary wordplay have two different

scripts that overlap in the setup of the joke. The algorithm deployed learns statistical

patterns of text in N-grams and provides a heuristic focus for a location of where

wordplay may or may not occur. It uses a wordplay generator to produce an utterance that

is similar in pronunciation to a given word, and the wordplay recognizer determines if the

utterance is valid by using N-gram. Once a possible wordplay is discovered, a joke

recognizer determines if a found wordplay transforms the text into a joke.

Page 4: Knock Knock Jokes
Page 5: Knock Knock Jokes

Acknowledgments

I would like to express my sincere gratitude to Dr. Lawrence Mazlack, who not only

made this project possible, but also very enjoyable. His advice, patience, ideas, and

many late evenings of arguments and inventions are only a few reasons in a very long

list. Thank you!

I would like to thank the Thesis committee, Dr. John Schlipf, Dr. Michele Vialet and Dr.

Carla Purdy. This work has greatly benefited from your suggestions.

Thanks are due to Electronic Text Center at the University of Virginia Library for the

permission to use their texts in the experiments. To Dr. Graeme Ritchie, thank you for

your comments in the initial stage of the project, and making your research available. I

would also like to thank Adam Hoffman for allowing the flexibility in time that made it

possible to complete this thesis. The list would not be complete without G.I. Putiy, who

has been an inspiration for many years.

I would like to thank my parents, Michael and Tatyana Slobodnik, and my brother Simon

for their love, encouragement, and support in too many ways to describe.

Last but not least, a sincere thank you to my husband, Matthew Taylor, without whose

love, help, understanding and support I would be completely lost.

Page 6: Knock Knock Jokes

Table of Content

List of Tables ................................................................................................. 4

1 Introduction ................................................................................................ 5

2 Background................................................................................................. 7

2.1 Theories of Humor................................................................................................... 7 2.1.1 Incongruity-Resolution Theory ........................................................................ 8 2.1.2 Script-based Semantic Theory of Humor ...................................................... 12 2.1.3 General Theory of Verbal Humor.................................................................. 17 2.1.4 Veatch’s Theory of Humor ............................................................................. 21

2.2 Wordplay Jokes...................................................................................................... 24

2.3 Structure of Jokes .................................................................................................. 26 2.3.1 Structural Ambiguity in Jokes........................................................................ 26 2.3.1.1 Plural and Non-Count Nouns as Ambiguity Enablers ................................. 26 2.3.1.2 Conjunctions as Ambiguity Enablers........................................................... 28 2.3.1.3 Construction “A Little” as Ambiguity Enabler ........................................... 28 2.3.1.4 Can, Could, Will, Should as Ambiguity Enablers ........................................ 28

2.3.2 The Structure of Punchline ............................................................................. 29

2.4 Computational Humor .......................................................................................... 35 2.4.1 LIBJOG ............................................................................................................ 35 2.4.2 JAPE.................................................................................................................. 36 2.4.3 Elmo .................................................................................................................. 37 2.4.4 WISCRAIC....................................................................................................... 38 2.4.5 Ynperfect Pun Selector.................................................................................... 40 2.4.6 HAHAcronym .................................................................................................. 41 2.4.7 MSG .................................................................................................................. 42 2.4.8 Tom Swifties ..................................................................................................... 43 2.4.9 Jester ................................................................................................................. 44 2.4.10 Applications in Japanese .............................................................................. 44

3 Statistical Measures in Language Processing........................................ 46

3.1 N-grams................................................................................................................... 46

3.2 Distant N-grams ..................................................................................................... 49

4 Possible Methods for Joke Recognition ................................................. 50

4.1 Simple Statistical Method...................................................................................... 50

1

Page 7: Knock Knock Jokes

4.2 Punchline Detector................................................................................................. 51

4.3 Restricted Context ................................................................................................. 52

5 Experimental Design ................................................................................ 54

6 Generation of Wordplay Sequences ....................................................... 56

7 Wordplay Recognition ............................................................................. 61

8 Joke Recognition ...................................................................................... 64

8.1 Wordplay in the Beginning of a Punchline.......................................................... 65

8.2 Wordplay at the End of a Punchline .................................................................... 66

8.3 Wordplay in the Middle of a Punchline............................................................... 67

9 Training Text ............................................................................................ 67

9.1 First Approach ....................................................................................................... 67

9.2 Second Approach ................................................................................................... 68

9.3 Third Approach ..................................................................................................... 69

9.4 Fourth Approach ................................................................................................... 71

9.5 Fifth Approach....................................................................................................... 72

10 Experimentation and Analysis........................................................... 73

10.1 Training Set ..................................................................................................... 73

10.2 Alternative Training Set Data Test ............................................................... 76

10.3 General Joke Testing ...................................................................................... 76 10.3.1 Jokes in the Test Set with Wordplay in the Beginning of Punchline ..... 79 10.3.2 Jokes in the Test Set with Wordplay in the Middle of a Punchline ....... 81

10.4 Testing Non-Jokes........................................................................................... 82

11 Summary.............................................................................................. 86

2

Page 8: Knock Knock Jokes

12 Possible Extensions ............................................................................. 88

13 Conclusion............................................................................................ 90

Bibliography ................................................................................................ 92

Appendix A: Training texts ....................................................................... 97

Appendix B: Jokes Used in the Training Set ........................................ 101

Appendix C: Jokes Used in the Test Set ................................................. 105

Appendix D: KK Recognizer Algorithm Description ........................... 115 Appendix E: A table of “Similarity of English consonant pairs using the natural classes model,” developed by Stefan Frisch……………...117

Appendix F: Cost Table developed by Christian Hemplemann .......... 118

3

Page 9: Knock Knock Jokes

List of Tables

Table1: The three-level scale............................................................................................. 23 Table2: Subset of entries of the Similarity Table, showing similarity of sounds in

words between different letters..................................................................................58 Table3: Examples of strings received after replacing one letter from the word “water”

and their similarity value to “water”..........................................................................60 Table4: Training Jokes results........................................................................................... 74 Table5: Unrecognized jokes in the training set ................................................................. 75 Table6: Results of the Joke Test Set.................................................................................. 79 Table7: Non-joke results ................................................................................................... 84

4

Page 10: Knock Knock Jokes

1 Introduction

Thinkers from the ancient time of Aristotle and Plato to the present day have strived to

discover and define the origins of humor. Most commonly, early definitions of humor

relied on laughter: what makes people laugh is humorous. Recent works on humor

separate laughter and make it its own distinct category of response. Today there are

almost as many definitions of humor as theories of humor; as in many cases, definitions

are derived from theories [Latta, 1999]. Still, “we are unsure of complete dimensions of

the concept” [Keith-Spiegel, 1972]. Some researchers say that not only there is no

definition that covers all aspects of humor, but also humor is impossible to define

[Attardo, 1994].

Humor is an interesting subject to study not only because it is difficult to define, but also

because sense of humor varies from person to person. Not only does it vary from person

to person; but the same person may find something funny one day, but not the next,

depending on what mood this person is in, or what has happened to him or her recently.

These factors, among many others, make humor recognition challenging.

Although most people are unaware of the complex steps involved in humor recognition, a

computational humor recognizer has to consider all these steps in order to approach the

same ability as a human being.

5

Page 11: Knock Knock Jokes

A common form of humor is verbal, or verbally expressed, humor. Verbally expressed

humor can be defined as humor “conveyed in language, as opposed to physical or visual

humor, but not necessarily playing on the form of the language” [Ritchie 2000]. Verbally

expressed humor is easier to computationally analyze as it involves reading and

understanding texts. While understating the meaning of a text may be difficult for a

computer, reading it is not an issue.

One of the subclasses of verbally expressed humor is the joke. Hetzron [1991] defines a

joke as “a short humorous piece of literature in which the funniness culminates in the

final sentence.” Most researchers agree that jokes can be broken into two parts, a setup

and a punchline. The setup is the first part of the joke, usually consisting of most of the

text, which establishes certain expectations. The punchline is a much shorter portion of

the joke, and it causes some form of conflict. It can force another interpretation on the

text, violate an expectation, or both [Ritchie, 1998]. As most jokes are relatively short,

and, therefore, do not carry a lot of information, it should be possible to recognize them

computationally.

Computational recognition of jokes seems to be possible, but it is not easy. An

“intelligent” joke recognizer requires world knowledge to “understand” most jokes.

Computational work in natural language has a long history. Areas of interest have

included: translation, understanding, database queries, summarization, indexing, and

6

Page 12: Knock Knock Jokes

retrieval. There has been very limited success in achieving true computational

understanding.

A focused area within natural language understanding is verbally expressed humor. As

Ritchie [1998] states, “It will probably be some time before we develop a sufficient

understanding of humour, and of human behaviour, to permit even limited form of jokes

to lubricate the human-computer interface. The goal of creating a robot that is

sufficiently ‘human’ to use humour in a way that makes sense or appears amusing … is a

long term one.”

2 Background

Joke examples in all sections are taken from the papers discussed, unless specifically

noted. The examples are unmodified from their appearance in the papers.

2.1 Theories of Humor

Just as there are many definitions of humor, there are many theories that cover the

different aspects of humor. The three major classes of these theories are: incongruity-

based, disparagement-based (sometimes called the superiority theory), and release-based.

7

Page 13: Knock Knock Jokes

Incongruity-based theories suggest that humor arises from something that violates an

expectation. Many supporters of incongruity in humor have emphasized the importance

of surprise in a joke [Raskin, 1985].

Disparagement, or Superiority, theories are based on the observation that people laugh at

other people’s infirmities, especially if they are enemies [Suls, 1976]. This class of

theories of humor goes back to Plato (Philebus) and Aristotle (Poetics), who maintained

that people laugh at the misfortunes of others for joy that they do not share them [Raskin,

1985] [Attardo, 1994].

Release/relief theories explain the link between humor and laughter. The principle for

release-based theory is that laughter “provides relief for mental, nervous and psychic

energy, and this ensures homeostasis after a struggle, tension, and strain” [Raskin, 1985].

“The most influential proponent of a release theory is certainly Freud” [Attardo, 1994,

p.50].

2.1.1 Incongruity-Resolution Theory

There are many incongruity theories. All theories state in one way or another that humor

consists of the juxtaposition of the incongruous. There is a debate whether incongruity

alone is sufficient for laughter. Some researches argue that incongruity is the first step of

a multi-stage process, and that a retrieval of information resulting in satisfactory

resolution of incongruity is a necessary step for a humorous response [Suls, 1976]

8

Page 14: Knock Knock Jokes

[Ritchie, 1999]. This theory is called Incongruity-Resolution (IR) theory, since it requires

an extra step: resolution of the incongruity; to “get the joke” one must make an indirect

connection between the incongruity and the resolution.

There are different ways to create and resolve incongruity. Ritchie [1999] addresses two

different models for incongruity resolution: the two-stage model of Suls [1972] and the

surprise disambiguation (SD) model.

The surprise disambiguation model [Ritchie, 1999] states that the setup part of a joke has

two interpretations, one of which is obvious the other more vague. Once the punchline is

reached, the audience becomes aware of the second, hidden meaning of the setup. The

meaning of the punchline conflicts with the first obvious interpretation but is compatible

with the second meaning, so the audience is forced into adopting the second meaning.

Joke1: “Postmaster: Here’s your five-cent stamp

Shopper (with arms full of bundles): Do I have to stick it on myself?

Postmaster: Nope. On the envelope.”

The first more obvious meaning of the setup of Joke1 is that the shopper needs help

putting the stamp on the envelope. When the punchline is read, the second, hidden

meaning is evoked: putting a stamp on the shopper, as opposed to putting it on the

envelope.

9

Page 15: Knock Knock Jokes

Ritchie argues that to process a joke similar in format to Joke1, the processor must be able

to analyze the setup. It should then predict the meaning of a likely continuation of the

text. It must also detect the punchline, no matter how long it is. After detecting the

punchline, it must process it and find the hidden meaning of the setup. Based on the

predicted meaning, the hidden meaning, and the punchline, the processor determines

whether the text is humorous.

The SD model makes use of ambiguity. Suls’ [1972] two-stage model does not demand

any ambiguity to be present in setup. Instead, “the punch line creates incongruity, and

then a cognitive rule must be found which enables the content of the punch line to follow

naturally from the information established in the setup” [Ritchie, 1999]. The following

algorithm is used to process a joke using two-stage model [Ritchie, 1999]:

• “As a text is read, make predictions

• While no conflict with prediction, keep going

• If input conflicts with prediction:

o If not ending – PUZZLEMENT

o If is ending, try to resolve:

No rules found – PUZZLEMENT

Cognitive rules found –HUMOR”

Since no ambiguity has to exist in the setup, the two-stage model can be used to analyze

Joke2. The SD model cannot be used to do it due to the absence of ambiguity in the joke.

10

Page 16: Knock Knock Jokes

Joke2: “Fat Ethel sat down at the lunch counter and ordered a whole fruit

cake. ‘Shall I cut it into four or eight pieces?’ asked the waitress.

‘Four’, said Ethel, ‘I m on a diet.’”

Even though it seems like absence of the necessary ambiguity is a plus, the two-stage

model has some problems. It is unclear how much of a conflict a text should contain to

make it count as a punchline. There is no clear definition of a “cognitive rule.” Finally,

what type of cognitive rule governs the creation of a humorous resolution rather than just

a misunderstanding? This question, slightly modified, can be directed to most IR theories:

how can one tell if a stated ambiguity created a joke or just a misunderstanding [Ritchie,

1999]?

Some researchers [Rothbart, 1976] suggest that incongruity and resolution should have a

further classification system. It is suggested that there should be at least two categories of

incongruity, possible and impossible incongruity; and, two categories of resolution,

complete and incomplete resolution. In Rothbart [1976] the terms are defined as follows:

• “Impossible Incongruity: elements that are unexpected and also impossible

given one’s current knowledge of the world.

• Possible Incongruity: elements that are unexpected or improbable but

possible

11

Page 17: Knock Knock Jokes

• Complete Resolution: the initial incongruity follows completely from

resolution information

• Incomplete Resolution: the initial incongruity follows from resolution

information in some way, but is not made completely meaningful because

the situation remains impossible.”

To illustrate this idea, consider Joke3:

Joke3: “‘Why did the cookie cry?’

‘Because its mother had been a wafer so long’”

The authors provide this explanation:

“There are two elements of incongruity, the fact that cookies don’t cry and

the initial incongruity or surprisingness of the answer to the riddle. The

answer contains its own resolution – the phonological ambiguity of ‘a

wafer’ (i.e. away for), but also adds the additional incongruity of a cookie

having a mother.”

2.1.2 Script-based Semantic Theory of Humor

One of the leading theories of verbal humor is Victor Raskin’s Script-based Semantic

Theory of Humor (SSTH) [Raskin, 1985]. It is designed to be neutral with respect to

12

Page 18: Knock Knock Jokes

three groups of humor theories. SSTH is easily compatible with most theories from the

three groups.

SSTH is a script-based theory. A script “is an enriched, structured chunk of semantic

information, associated with word meaning and evoked by specific words” [Raskin,

1985, p.99]. Attardo [1994] cites Raskin when saying that the main hypothesis of the

theory is:

“A text can be characterized as a single-joke-carrying text if both of the

[following] conditions are satisfied.

(i) The text is compatible, fully or in part, with two different

scripts.

(ii) The two scripts with which the text is compatible are

opposite.

… The two scripts with which some text is compatible are said to overlap

fully or on part on this text. The set of two conditions [above] is proposed

as necessary and sufficient conditions for a text to be funny.”

From the hypothesis above, it is clear that verbal humor is based on ambiguity that is

deliberately created. However, ambiguity itself is not enough: the scripts must not only

be opposed, they must do so unexpectedly. Some examples of different oppositions are:

good : bad, real : unreal, money : no money, life : death, etc [Raskin, 1985].

13

Page 19: Knock Knock Jokes

To show how SSTH works, Raskin [1985] analyzes Joke4.

Joke4: “‘Is the doctor at home?’ the patient asked in his bronchial whisper.

‘No,’ the doctor’s young and pretty wife whispered in reply.

‘Come right in.’”

The first step in analyzing the joke, is listing the scripts evoked by the text of the joke

clause by clause [Attardo, 1994] [Raskin, 1985]:

Word: Script:

(i) IS=BE v. 1. EQUAL OR BELONG TO A SET

2. EXIST

3. SPATIAL

4. MUST

(ii) THE det 1. DEFINITE

2. UNIQUE

3. GENERIC

(iii) DOCTOR n 1. ACADEMIC

2. MEDICAL

3. MATERIAL

14

Page 20: Knock Knock Jokes

4. MECHANICAL

5. INSECT

(iv) AT prep 1. SPATIAL

2. TARGET

3. OCCUPATION

4. STATE

5. CAUSE

(v) HOME n 1. RESIDENCE

2. SOCIAL

3. HABITAT

4. ORIGIN

5. DISABLED

6. OBJECTIVE

The second step is the launch of combinatorial rules [Raskin, 1985] [Attardo, 1994]. It

means that each clause is looked at for scripts that are evoked by two or more words. In

this example, the script SPATIAL is evoked by words IS and AT. SPATIAL is the only script

that is evoked more than once by the five words above. After combinatorial rules are

defined, the inferences they trigger are looked at. Since AT is SPATIAL, HOME can be

either RESIDENCE or HABITAT, but nothing else. Continue using combinatorial rules and

15

Page 21: Knock Knock Jokes

inferences, until possible meanings of the first sentence are found. In the example, the

two possible meanings are [Attardo, 1994]:

(i) Question: The unique proprietor of a family residence who is a

physician is physically present in the residence.

(ii) Question: The unique proprietor of a family residence who has the

doctoral degree is physically present in the residence.

Since there is more than one meaning to the first sentence, the combinatorial rules will

register the ambiguity.

The second step is repeated until the entire text is analyzed, applying combinatorial rules

and inferences. In this example, a script for DOCTOR will be found. The analysis of

combinatorial rules will also produce a question: Why does the doctor’s wife want the

patient to come in? To answer this, combinatorial rules will have to be used to go back

and search for another possible script. If the second script is found, and the second script

is opposite in meaning to the first one, there is a scripts opposition, and the text is a joke.

If the second script is not found or it does not oppose in any way the first one, the text is

not a joke. In this example, the second script LOVER will be found. Since LOVER is

opposite in the meaning to DOCTOR, the text is a joke [Attardo, 1994] [Raskin, 1985].

SSTH is the first formal theory [Ruch, 2001]. A formal theory [Simpson, 2001] is a

structure of primitives, axioms, and theorems. Primitives are a small collection of

16

Page 22: Knock Knock Jokes

predicates, which are regarded as basic for a given field of study. Primitives delimit the

scope of the theory. Axioms are basic or self-evident formulas. Theorems are formulas

that are logical consequences of the axioms.

According to Attardo [1994, p.208], SSTH has some drawbacks. They become evident

when an attempt is made to apply it to texts other than jokes. The other limitation is that

SSTH is a semantic theory only. While it can determine if a short text is a joke, it cannot

tell how similar two jokes are. The General Theory of Verbal Humor answers these

questions.

2.1.3 General Theory of Verbal Humor

Unlike Raskin’s humor theory, which is semantic, the General Theory of Verbal Humor

(GTVH) is a linguistic theory.

The General Theory of Verbal Humor [Attardo, 1991] is a combination of the Script-

based Semantic Theory of Humor and Attardo’s Five-Level joke representation model.

The five levels in Five-Level Model are: Surface, Language, Target + Situation,

Template, Basic script opposition and Logical mechanism.

GTVH describes each joke in terms of six Knowledge Resources [Attardo, 1994]:

• Script Opposition (SO): deals with script opposition presented in SSTH.

17

Page 23: Knock Knock Jokes

• Logical Mechanism (LM): accounts for the way in which the two senses

(scripts, etc) in the joke are brought together, corresponds to the resolution

phase of the incongruity/resolution model.

• Situation (SI): the “props” of the joke, the textual materials by the scripts of the

joke that are not necessarily funny.

• Target (TA): any individual or group from whom humorous behavior is

expected. Target is the only optional parameter among the six KRs.

• Narrative Strategy (NS): The “genre” of the joke, such as riddle, 1-2-3

structure, question and answer, etc, it is rhetorical structure of the text.

• Language (LA): The actual lexical, syntactic, phonological, etc., choices at the

linguistic level that instantiate all the other choices. LA is responsible for the

position of the punchline.

Having all six Knowledge Resources defined, a joke, according to Attardo [1994, p.226],

can be looked at as a “6-tuple, specifying the instantiation of each parameter.”

“Joke: {LA, SI, NS, TA, SO, LM}”

Two jokes are different if at least one parameter of the six above is different in the jokes.

The other very important aspect of GTVH is the ordering of knowledge resources. The

Knowledge Resources are ordered in the following manner:

18

Page 24: Knock Knock Jokes

Script Opposition

Logical Mechanism

Situation

Target

Narrative Strategy

Language

The ordering was experimentally tested in human study described in Ruch [1993]. The

study concluded that there is a linear increase in similarity between pairs of jokes selected

along the Knowledge Resources hierarchy, with exception of Logical Mechanism. This

means that jokes that have all the parameters the same but Script Opposition, are less

similar than jokes that have all the parameters the same but Situation, are less similar than

jokes that have all the parameters the same but Target, are less similar than jokes that

have all the parameters the same but Narrative Strategy, are less similar than jokes that

have all the parameters the same but Language.

In addition, the larger the number of parameters that the jokes have in common, the more

similar they are.

19

Page 25: Knock Knock Jokes

Joke ordering does not only effect the similarity of jokes, but also affects the choice of

parameters in the hierarchy: Script Opposition limits the choice of Logical Mechanism,

which limits the choice if Situation, which limits the choice of Target, which limits the

choice of Narrative Strategy, which limits the choice of Language. An example that

Attardo [1994] uses is: the choice of the Script Opposition of Dumb/Smart will determine

the choice of the Target (in North-America to Poles, etc).

To illustrate GTVH Joke5 and Joke6 are examined:

Joke5: “How many Poles does it take to screw in a light bulb? Five. One to

hold the light bulb and four to turn the table he’s standing on.”

Joke6: “How many Poles does it take to wash a car? Two. One to hold the

sponge and one to move the car back and forth.”

Both jokes have the same Script Opposition (Dumb/Smart), same Logical Mechanism

(figure-ground reversal), same Target (Poles), same Narrative Strategy (riddle), same

Language, but different Situation: in Joke5 the Situation is light bulb, but in Joke6 it is car

wash. Conclusion: as the two jokes only differ in one Knowledge Resource, they are

considered very similar.

Consider another joke:

20

Page 26: Knock Knock Jokes

Joke7: “The number of Polacks needed to screw in a light bulb? – Five – One

holds the bulb and four turn the table.”

This joke has the same parameters as Joke5 but Language. Conclusion: Joke5 and Joke7

are very similar since they only differ in one Knowledge Resource. However, since

Language comes after Situation in the hierarchy, Joke5 and Joke6 are less similar than

Joke5 and Joke7. One the other hand, Joke6 and Joke7 have two different parameters,

Situation and Language. They have less similarity that Joke5 and Joke7 or Joke5 and Joke6.

2.1.4 Veatch’s Theory of Humor

Veatch’s Theory of Humor [Veatch, 1998] is based on the concept that humor is a form

of pain that does not hurt. It requires three conditions that are, individually, necessary

and, jointly, sufficient for humor to occur. “The conditions of this theory describe a

subjective state of apparent emotional absurdity, where the perceived situation is seen as

normal, and where, simultaneously, some affective commitment of the perceiver to the

way something in the situation ought to be is violated.” The three conditions are violation

(V), normal (N), and simultaneity of V and N. They are defined as:

“V: The perceiver has in mind a view of the situation as constituting a

violation of some affective commitment of the perceiver to the way

21

Page 27: Knock Knock Jokes

something in the situation ought to be. That is, a ‘subjective moral

principle’ of the perceiver is violated.

N: The perceiver has in mind a predominating view of the situation as

being normal.

Simultaneity: The N and V understandings are present in the mind of the

perceiver at the same instant in time.”

Veatch [1998] defines a subjective moral principle, as a principle, in which an individual

both has an affective commitment to and believes, ought to hold. According to the theory,

a person laughs at something if he finds that there is something “wrong” or if there is a

violation of a norm or taboo, yet this “wrong” is perceived as being ok. If there is no

violation, or if the violation creates a conflict that a person is not comfortable with, a

situation is not humorous. An example that Veatch provides is a person that has recently

suffered a car accident will not think that traffic violation jokes are funny. However,

somebody who takes them lightly will. This explains why a joke can be funny to one

group and not funny to another or to the same person at different times. The theory offers

two predictions:

“Prediction 1: If X finds a situation funny where some principle is

violated, and Y instead finds it to be offensive, frightening, or threatening,

22

Page 28: Knock Knock Jokes

then we should find that Y is more attached to the principle violated than

X, not vice versa.

Prediction 2: If on the other hand, some perceiver Z finds the

aforementioned situation unremarkable, then we should find that Z has no

personal moral attachment to principles violated; we should not find, for

example, that Z is more attached to them than the X is who finds it funny.”

It is possible to tell something about ones sense of humor, and about ones values, using

the above predictions. Having some information about ones “moral principles”, this

theory should provide with an answer to whether a person will find a joke funny. Veatch

[1998] constructs a table of perceiver reactions to the joke, given his commitment to

violation and some of the principles.

Perceiver Level Logic Commitment Gets Is Offended Sees Humor

Level 1 Not-V None No No No Level 2 V and N Weak Yes No Yes Level 3 V and not-N Strong Yes Yes No

Table1: The three-level scale*

Table1 shows that if there is no violation of principles in a joke, a perceiver does not find

the joke humorous. A joke is also not humorous if a commitment to a violated principle is

strong, and a perceiver does not find the situation normal; although the perceiver

* Taken from Veatch [1998]

23

Page 29: Knock Knock Jokes

understands that structurally it is a joke, he is offended by it and not see any humor. The

only situation when a perceiver sees humor is when there is a violation, yet the

perceiver’s commitment to it is weak; therefore, he is not offended by it and sees humor.

Although this theory is difficult to use as a basis for computational humor, it can be used

to predict whether a joke is funny to someone whose moral principles are known.

2.2 Wordplay Jokes

Wordplay jokes, or jokes involving verbal play, are a class of jokes depending on words

that are similar in sound, but being used in two different meaning. The difference

between the two meanings creates conflict or breaks expectation, and is humorous. The

wordplay can be created between two words with the same pronunciation and spelling,

with two words with different spelling but the same pronunciation, and with two words

with different spelling and similar pronunciation. For example, in Joke8 the same word is

the focus of the joke. This means that the conflict is created because the word has two

meanings, while the pronunciation and the spelling stay the same. In Joke9 and Joke10 the

wordplay is between words that sound nearly alike.

Joke8: “Cliford: The Postmaster General will be making the TOAST.

Woody: Wow, imagine a person like that helping out in the kitchen!”

Joke9: “Diane: I want to go to Tibet on our honeymoon.

24

Page 30: Knock Knock Jokes

Sam: Of course, we will go to bed.”

Joke10: “Emil is not something you name your kid; it is something that you

eatӠ

Sometimes it takes world knowledge to recognize which word is subject to wordplay.

For example, in Joke9, there is a wordplay between “Tibet” and “to bed.” However, to

understand the joke, the wordplay by itself is not enough, a world knowledge is required

to “link” honeymoon with “Tibet” and “to bed.”

Some jokes require more wordplay than others. For example, Joke11 requires an

understanding that “From Concentrate” is written on most of the orange juice containers.

Joke12 requires knowledge that patients frequently wait long periods of time before

admitted to see a doctor:

Joke11: “Butch: Hey, Stupid Steve, why are you staring at that carton of

orange juice?

Stupid Steve: Because it says ‘Concentrate’”‡

Joke12: “Nurse: I need to get your weight today.

Impatient patient: Three hours and twelve minutes.”

† Joke8, Joke9, Joke10 are taken from TV show “Cheers” ‡ Joke11 and Joke12 are taken from “The Original 365 Jokes, Puns & Riddles Calendar,” 2004

25

Page 31: Knock Knock Jokes

A focused form of wordplay jokes is Knock Knock jokes. In Knock Knock jokes,

wordplay is what leads to the humor. The structure of the Knock Knock jokes provides

pointers to the wordplay.

2.3 Structure of Jokes

After looking at numerous theories, an observation can be made that many jokes, but not

all, are based on ambiguity. Sometimes ambiguity is based on one word; sometimes an

entire sentence is ambiguous. For a joke to exist, there has to be a secondary meaning to

it. In script-based theory, the overlap of two scripts is the ambiguity; in the IR theory

ambiguity has to exist in the setup.

2.3.1 Structural Ambiguity in Jokes

Oaks [1994] discusses some devices for creating structural ambiguity in jokes. All jokes

in the section are examples used by Oaks, taken from Clark [1968].

2.3.1.1 Plural and Non-Count Nouns as Ambiguity Enablers

One of ambiguity enablers is the set of plural and non-count nouns. Since these nouns do

not require indefinite articles, it is easy to confuse them with other parts of speech, such

as verb and adverbs. An example that Oaks [1994] provides is the headline “British Left

Waffles On Falkland Island.” It is unclear whether “the political left in England can't

26

Page 32: Knock Knock Jokes

make up its mind on the Falklands or the mess sergeants didn't clean up after breakfast.Ӥ

Depending on whether the word “left” is a noun or a verb, the word “waffles” becomes

either a noun or a verb, resulting in different meanings of the sentence.

The second reason to use plural nouns is that it is difficult to delineate between Verb +

Noun and Adjective + Noun. Consider this example:

Joke13: “Question: What’s worse than raining cats and dogs?

Answer: Hailing taxis.”

If a singular noun was used (taxi), it would be possible to distinguish between “hailing a

taxi” (Verb + Noun), and “a hailing taxi”(Adjective + Noun).

Non-count nouns are as powerful in creating ambiguity:

Joke14: “Diner: This coffee is like mud.

Waiter: Well, it was ground this morning!”

In Joke14 a non-count noun “ground” can be either a verb or a noun.

§ http://www.yourdictionary.com/library/ling004.html

27

Page 33: Knock Knock Jokes

2.3.1.2 Conjunctions as Ambiguity Enablers

Conjunctions may also play the role of ambiguity enablers. In the Joke15 it is unclear if

the word “flies” is used as a verb or noun. Notice, that flies is in plural form.

Joke15: “Question: What has four wheels and flies?

Answer: A garbage truck.”

2.3.1.3 Construction “A Little” as Ambiguity Enabler

The next ambiguity enabler discussed by Oaks [1994] is the construction “a little”.

Joke16: “Romeo (as he threw stones into the stream): I am merely a pebble in

your life.

Juliet: Why don’t you try being a little boulder [bolder]?”

In Joke16 “a little” together with the use of the suffix “er” creates an ambiguity as to the

form of boulder, it can be used as a noun or an adverb.

2.3.1.4 Can, Could, Will, Should as Ambiguity Enablers

Modals such as can, could, will, should, etc are also important enablers. To quote Oaks

[1994], they “are not only important because they do not themselves carry an inflectional

28

Page 34: Knock Knock Jokes

ending even when used with a third person singular, but because they must also be

followed by a verb in its bare infinitival form.” Tense shifting is another good way to

create ambiguity, since the is no difference between past tense singular and past tense

plural, except for the forms of “be;” there is also no difference between future tense

singular and plural.

It should be possible to recognize structural ambiguity in texts, using the enablers

discussed in this section. However, very few jokes are based on structural ambiguity.

While structural ambiguity may be a sufficient condition for a joke, it is not a necessary

condition. Therefore, if structural ambiguity is not found it in a text, a conclusion cannot

be drawn that the text is not a joke. On the other hand, if structural ambiguity is present,

the text contains a setup and a punchline, and ambiguity is found in the setup, the given

text is humorous.

2.3.2 The Structure of Punchline

Hetzron [1991] offers an analysis of punchlines in jokes.

Joke17: “Russian offices are served beer in an Eastern European tavern. They

order beer. The waiter places coasters in the table and serves the beer.

Later they order another round. The water returning with the beer finds

no coasters. ‘OK,’ he tells himself, ‘these are collectors,’ and puts

down another set of coasters. When the third round is ordered and

29

Page 35: Knock Knock Jokes

brought out, there are again no coasters. Angry, the waiter puts the

beer down on the table, but places no more coasters. One of the

Russian officers protests: ‘What is it? No more crackers?’”

According to Hetzron [1991], Joke17 works because at the level of a narrative, the

audience does not know why the coasters disappeared. Until the very end of the text the

audience is lead to believe that the Russian officers are collectors; and what is more

important, until the very end, there is no hint that somebody may find coasters edible.

Hetzron argues, that what makes a text a joke is “that the tension should reach its highest

level at the very end. No continuation relieving this tension should be added” [Hetzron,

1991 p.66]. He defines a joke as “a short humorous piece of literature in which the

funniness culminates in the final sentence, called the punchline.”

Jokes can be divided into three categories: straight-line, dual, and rhythmic. A straight-

line joke is a joke where “one successive episode” culminates in a punchline [Hetzron,

1991].

Joke18: “A woman goes to the rabbi: ‘Rabbi, what shall I do so that I

wouldn’t become pregnant again?’ The rabbi says: ‘Drink a glass of

cold water!’ ‘Before or after?’ The rabbi replies: ‘Instead!’”

The answer of the rabbi in Joke18 is unexpected. The audience expects to hear some

advice about protection; but he suggests water. Water is not something that prevents

30

Page 36: Knock Knock Jokes

pregnancies, however, until the very last word of the joke is heard, the audience does not

think that the given advice is about abstinence. In this joke the expectation grows in

linear fashion: there is no quick drops right before or after the high point of the joke.

The second type of joke is the dual joke: “it contains two pulses, often in contrast”

[Hetzron, 1991].

Joke19: “The Parisian Little Moritz is asked in school: ‘How many deciliters

are there in a liter of milk?’ He replies: ‘One deciliter of milk and nine

deciliters of water’ – In France, this is a good joke; in Hungary, this is

good milk.”

This joke has a punchline (the first pulse) and a commentary afterwards (the second

pulse), which turns out to be funnier than the punchline. The other kind of dual joke is the

“Mixing Domain” [Hetzron, 1991] joke. Hetzron gives an example of the “good news-

bad news” joke:

Joke20: “Mr. Rabinowitz’s son studies at Berkeley. When his neighbor goes

for a visit in the area, Rabinowitz asks him to look up his son. The

neighbor comes back telling: ‘I have good news and bad news. Which

one do you want to hear first?’ Mr. R. first wants the bad news. The

neighbor says: ‘I am sorry to say, your son has become gay. This is the

31

Page 37: Knock Knock Jokes

bad news. The good news is: he has found for himself such a bekhoved

[respectable] Jewish doctor.’”

Joke20 works because of contrast that occurs between two pulses, which can also occur in

other shapes.

The third type of joke is a rhythmic joke [Hetzron, 1991]. Rhythmic jokes contain at

least three pulses. The number three is very popular in jokes. Very often three-pulse jokes

contain repetitions. The first two pulses set a pattern for the third pulse, while the third

pulse breaks the pattern. There are jokes “where a sensation of automation is produced”

[Hetzron, 1991]. For example, the same answer to different questions creates an even

rhythm, yet the answers may be contradictory. Sometimes automation is produced by

“trap-questions,” where the answers come from a real audience, and all correct answers

but last sound similar. The audience gets into a pattern of answering in rhythm, and gives

a rhythmic but incorrect answer to the last question. Sometimes pulses in jokes are less

and less expected as the joke progresses: each pulse is funny, but the next one is even

funnier. Sometimes “pulses are not separate episode, but part of enumeration” [Hetzron,

1991]. An example of a joke with separate pulses is Joke21:

Joke21: “A newspaper reporter goes around the world with his/her

investigation. He/she stops people on the street and asks them:

“Excuse me Sir/Madam, what is your opinion of the meat shortage?”

An American asks: “What is ‘shortage’?” The Russian asks: “What is

32

Page 38: Knock Knock Jokes

‘opinion’?” The Pole asks: “What is ‘meat’?” The New York taxi-

driver asks: “What is ‘excuse me’?””

The other topic discussed in Hetzron’s [1991] paper is devices that make punchlines

work. Each joke can have one or more of such devices. Jokes can also be divided into

two classes: “one where an intended absolute meaning is taken to be relative and one

where a relative meaning … is viewed as if it were absolute” [Hetzron, 1991]. Joke22

illustrates an “expected absolute identity replaced by a relative one” [Hetzron, 1991].

Joke22: “Upon seeing a funeral procession: ‘Who died?’ – ‘I believe the one

in the hearse.’”

The next joke illustrates “inherently relative used as absolute” [Hetzron, 1991].

Joke23: “A man approached a policeman walking his beat in the street: ‘Could

you tell me please, where is the other side of the street?’ The

policeman points to it over there. The man says: ‘It can’t be there, they

told me it was over here.’”

Joke2 is an example of “expected relative identity used as absolute” [Hetzron, 1991].

Besides being divided into classes, jokes can be divided into different categories

[Hetzron, 1991]:

33

Page 39: Knock Knock Jokes

• “Reassignment in time

• Reassignment to a universe of fulfilled frustration

• Recombination

• Dependency reversal

• Independent existence claimed for dependent element

• Knowledge of attachment preceded knowledge of base

• Co-dependent > sub-dependent

• Retro-communication

• Foiled expectation

• Indirect communication, information left unsaid

• Mixing domains

• Internal contradiction”

Some of the devices are used in wordplay jokes. One of these devices is Recombination.

Recombination can be described as “change your partner” or AB-CD>AD-CB. [Hetzron,

1991]. An example that Hetzron provides is “Right wing vs. White ring.”

Retro-Communication is another example where wordplay can be used:

Joke24: “An American asks a Chinese man: ‘How often do you have

elections?’ The latter answers: ‘Evely day.’”

34

Page 40: Knock Knock Jokes

The devices can be combined in some jokes. For example, here is Hetzron’s [1991]

analysis of Joke17: “First of all, this is a rhythmic joke. … Then we have Retro-

Communication concerning the anomalous fact that the officers mistake coasters for

crackers. Indirect Communication behind this explains that mysterious disappearance of

the coasters: they must have been eaten. The communication is also Retroactive.”

2.4 Computational Humor

There is no single formalized theory that can be used for a software or computer program

to recognize or generate jokes. Several attempts have been made to write programs that

can recognize [Yokogawa, 2002] [Takizawa, 1996] or generate [Attardo, 1993] [Binsted,

1996a] [McKay, 2002] [Stock, 2002] [McDonough, 2001] [Lessard, 1992] [Binsted,

1998] humorous text. They are a long way off from a computer making its mark as a

standup comedian.

2.4.1 LIBJOG

Attardo and Raskin [Attardo, 1993] created a simple Light Bulb Joke Generator, LIBJOG.

The computation of LIBJOG is theory free. The program uses an entry for commonly

stereotyped groups and a Template1:

Template1: How many [lexical entry head] does it take to change a light bulb?

[number1]. [number1 – number2] to [activity1] and [number2] to

[activity2].

35

Page 41: Knock Knock Jokes

Entry1 is an illustration of the algorithm:

Entry1: (Poles (activity1 hold the light bulb)

(number1 five)

(activity2 turn the table he’s standing on)

(number2 four))

This entry produces Joke25.

Joke25: “How many Poles does it take to change a light bulb? Five. One to

hold the light bulb and four to turn the table he’s standing on.”

The joke-generating mechanism is very limited: while this joke is technically computer

generated, it does not assemble or analyze any features of the joke.

2.4.2 JAPE

JAPE is a computer program created by Binsted [1996a] that generates simple punning

riddles. JAPE uses word substitution, syllable substitution and metathesis to create

phonological ambiguity. One of the main features of JAPE is its humor-independent

lexicon. Each produced joke has a setup and a punchline. The components that make up

JAPE are: the lexicon, consisting of fifty-nine words and twenty-one NPs, homophone

36

Page 42: Knock Knock Jokes

base, six schematas, fourteen templates, and “post-production checker” [Ritchie, 2003].

Some of the examples of the jokes produced by JAPE are:

Joke26: “What do you call a quirky quantifier?

An odd number”

Joke27: “What’s the difference between money and a bottom?

One you spare and bank, the other you bare and spank.”

Binsted and Ritchie [1994] note that there is not a lot of humor theory behind JAPE.

However, Attardo argues that the program is largely congruent with GTVH [Attardo,

2002b].

2.4.3 Elmo

Elmo [Loehr, 1996] is the natural language robot. Dan Loehr has integrated JAPE pun

generator and Elmo. The two are integrated on four different levels [Loehr, 1996]:

• “Upon a request to tell a joke or a riddle, Elmo issues a query for a pun from

JAPE

• An attempt to make Elmo produce humor relevant to arbitrary user input.

• Make Elmo produce humor relevant to user input, using a pre-chosen pun

37

Page 43: Knock Knock Jokes

• Make Elmo return a carefully scripted reply to archive a ‘smoother’

response.”

An example of the third level of integration is given below [Loehr, 1996]:

“No more help for me?

You say, ‘No more help for me?’

Elmo says, ‘I don’t know what else to say, Dan.’

Elmo says, ‘What do you call a useless assistant?’

Elmo says, ‘a lemon aide.’”

While the integration of the pun generator with natural language interface was successful,

it had difficulties producing relevant humor on arbitrary input.

2.4.4 WISCRAIC

Witty Idiomatic Sentence Creation Revealing Ambiguity In Context (WISCRAIC)

[McKay, 2002] is a joke generator that focuses on witticisms based around idioms. This

program produces jokes and explanations for the created jokes, making it possible for the

program to be used as an aid for teaching English idioms to non-native speakers [McKay,

2002].

The program consists of three modules [McKay, 2002]:

38

Page 44: Knock Knock Jokes

• “Joke Constructer – the module that contains information about elements of a

joke. This module uses dictionary of idioms, dictionary of professions, general

dictionary, and lexicon.

• Surface-form Generator – the module that uses grammar to convert an input

from Joke Constructor into a joke.

• Explanation Generator – this module takes the elements provided by Joke

Constructer and uses grammar to generate an explanation of relations between

the elements.”

An example of a WISCRAIC joke and explanation given below, is taken from McKay

[2002]:

“The friendly gardener had thyme for the woman!

The word time, which is part of the idiom [have, time, for, someone] is a

homonym of the word thyme.

A HOMONYM is a word that sounds like another word.

-----

| LINK | between thyme and gardener:

---------------------------------

| thyme is a type of plant

| a gardener works with plants

39

Page 45: Knock Knock Jokes

‘friendly’, which is associated with the idiom [have, time, for, someone]

was selected from other adjectives as it has the highest imagability score:

439”

The program was tested with 84% success rate on the WISCRAIC produced jokes.

2.4.5 Ynperfect Pun Selector

Hempelmann [2003a] proposes “The Ynperfect Pun Selector,” as a complement to

general pun generator based on General Theory of Verbal Humor. YPS would use

heterophonic puns: puns that use a similar sound sequence. It would take any English

word as its input and generates a set of words similar in sound, ordered by their

phonological similarity. This output could then be entered into a general pun generator

for evaluation of the semantic possibilities of the choices produced by YPS.

“For example, ‘dime’ to denote not just a 10¢ coin [daym] but paradigmatically also the

meaning of [dæm] as in the slogan ‘Public transportation: It’s a dime good deal.’ YPS’s

purpose here is to generate a range of phonologically possible puns given a target word,

for example, how we could use not only dam (‘barrier across waterway’) as a

homophonic pun to target damn, but also the heterophonic candidates dime (as in the

example above), but also damn, dome, dumb, damp, tame, etc. In addition, the selector

will evaluate the possible puns in a certain order of phonological proximity to their

target” [Hempelmann, 2003a].

40

Page 46: Knock Knock Jokes

The phonological comparison of a homophonic put to target is the only part of YPS that

has been implemented.

2.4.6 HAHAcronym

HAHAcronym: Humorous Agents for Humorous Acronyms is the first computational

humor project sponsored by the European Commission. “One of the purposes of the

project is to show that using standard resources …, and suitable linguistic theories of

humor …, it is possible to implement a working prototype” [Stock, 2002]. The main tool

used is an incongruity detector/generator. To create a successful output, it is important to

detect any variances in meanings between the acronym and its context. A basic resource

for the incongruity generator is an independent structure of domain oppositions, such as

Religion vs. Technology, Sex vs. Religion, etc. An ANT-based parsing technique was

used to parse word sequences.

The project inputs an existing acronym and after comparing the actual meaning and

context comes up with humorous parody of it, using the algorithm below [Stock, 2002]:

• “Acronym parsing and construction of a logical form

• Choice of what to keep unchanged (typically the head of the highest ranking

NP) and what to modify (i.e. adjectives)

• Look for possible substitutions:

41

Page 47: Knock Knock Jokes

o Using semantic field opposition

o Keeping the initial letter, rhyme, and rhythm (the modified acronym

should sound similar to the original as much as possible)

o For adjectives, basing reasoning mainly on WordNet anatomy clustering

o Using the a-semantic dictionary.”

Some of the examples of the acronym reanalysis are [Stock, 2002]:

“MIT (Massachusetts Institute of Technology) Mythical Institute of

Theology

ACM (Association for Computing Machinery) Association for Confusing

Machinery”

At the time the paper was published, there was no evaluation of the competed prototype.

2.4.7 MSG

The Mnemonic Sentence Generator (MSG) [McDonough, 2001] is a program built by

Craig McDonough that converts any alphanumeric password into a humorous sentence.

One of the reasons for creating the program is that passwords are now an integral part of

everyday life. The simpler a password is the easier it is to guess. Good passwords

consisting of both alphabet and numeric characters are difficult to remember. The system

42

Page 48: Knock Knock Jokes

takes an eight character alphanumeric string and turns it into a sentence possibly making

it easier to remember than the password itself, which is one of the main requirements.

The sentence template consists of two clauses of each four words [McDonough, 2001]:

Template2: (W1 = Person Name) + (W2 = Positive-Verb) +

(W3=Person Name + “s”) + (W4 = Common Noun) +

“, while” + (W5 = Person Name) + (W6 = Negative-Verb) +

(W7 = Person-Name + “s”) + (W8 = Common Noun)

The program combines opposite scripts by using a positive verb in the first clause and a

negative verb in the second clause. An example of what the program can do is the

following sentence, generated from password “AjQA3Jtv”: “Arafat joined Quayle’s Ant,

while TARAR Jeopardized thurmond’s vase” [McDonough, 2001].

2.4.8 Tom Swifties

Lessard and Levison [Lessard, 1992] have created an attempt to model a particular type

of linguistic humor, Tom Swifties, by means of a sentence generator. “Tom Swifties are

pun-like utterances ascribed to the character Tom, in which a manner adverb enters into a

formal and semantic relation with the other elements in the sentence.”

Joke28: “‘I hate seafood’, Tom said crabbily.”

43

Page 49: Knock Knock Jokes

Everything produced by this generator is in the form of Template3:

Template3: “SENTENCE” said Tom ADV[manner]

The adverb in the Template3 must have a phonetic link to a meaning of at least one word

in the SENTENCE, and be semantically related to it. Thus, in Joke28, there is a semantic

relationship between the words “seafood” and “crab” and a phonetic link between the

words “crab” and “crabbily” [Lessard, 1992].

2.4.9 Jester

“Jester” [Goldberg, 2001] is an online-joke recommending system, based on

collaborative filtering. A user is given fifteen jokes to rate. After reading and evaluating

the jokes, the system uses statistical techniques to recommend jokes based on the user’s

rating of the sample. “To rate items, users are asked to click their mouse on a horizontal

‘ratings bar’ which returns scalar values” [Goldberg, 2001]. The system then finds a

“nearest neighbor” to the user’s ratings; and, recommends the next joke from the list of

the nearest neighbor. The program does not use any linguistic theories of humor, but does

take into account a user’s sense of humor.

2.4.10 Applications in Japanese

There has been some work in computational humor in Japanese.

44

Page 50: Knock Knock Jokes

Yokogawa [2002] proposes Japanese pun analyzer, which is based on the hypothesis that

the similarity of articulation matches similarity of sounds. The system has four steps:

• Morphological analysis

• Connection check

• Generation of similar expressions

• Pun candidate check

Experimental results show that the system is able to recognize about 50% of

ungrammatical pun sentences.

Binsted and Takizawa [Binsted, 1998] implemented a simple model of puns in a program

BOKE, which generates puns in Japanese. BOKE is similar to JAPE: the programs differ

in the lexicon and the templates that are used to generate the text, but the punning

mechanisms are the same.

Takizawa also have implemented a pun-detecting program for Japanese, which accepts a

sequence of phonemic symbols and produces possible analyses of this in terms of

sequences of Japanese words, rating each word-sequence with the likelihood that it is a

pun, based on various heuristics [Ritchie, 1998].

45

Page 51: Knock Knock Jokes

3 Statistical Measures in Language Processing

A joke generator has to have an ability to construct meaningful sentences, while a joke

recognizer has to recognize them. While joke generation involves limited world

knowledge, joke recognition requires a much more extensive world knowledge.

To be able to recognize or generate jokes, a computer should be able to “process”

sequences of words. A tool for this activity is the N-gram, “one of the oldest and most

broadly useful practical tools in language processing” [Jurafsky, 2000]. An N-gram is a

model that uses conditional probability to predict Nth word based on N-1 previous words.

N-grams can be used to store sequences of words for a joke generator or a recognizer.

3.1 N-grams

N-grams are typically constructed from statistics obtained from a large corpus of text

using the co-occurrences of words in the corpus to determine word sequence probabilities

[Brown, 2001]. As a text is processed, the probability of the next word N is calculated,

taking into account end of sentences, if it occurs before the word N.

“The probabilities in a statistical model like an N-gram come from the corpus it is trained

on. This training corpus needs to be carefully designed. If the training corpus is too

specific to the task or domain, the probabilities may be too narrow and not generalize

46

Page 52: Knock Knock Jokes

well to new sentences. If the training corpus is too general, the probabilities may not do a

sufficient job of reflecting the task or domain” [Jurafsky, 2000].

A bigram is an N-gram with N=2, a trigram is an N-gram with N=3, etc. A bigram model

will use one previous word to predict the next word, and a trigram will use two previous

words to predict the word.

As bigram probability is conditional, the formula for bigram probability is:

p(A|B) = p(A and B)/p(B)**

To calculate p(B), the following formula can be used:

p(B) = (number of occurrences of B in the text)/(number of words in the text)

Similarly,

p(A and B) = (number of occurrences of A and B in the text)/(number of words in the text)

This means that p(A|B) is:

p(A|B) = (number of occurrences of A and B in the text)/(number of occurrences of B in the text)

** For more information on conditional probability, see http://www.mathgoodies.com/lessons/vol6/conditional.html

47

Page 53: Knock Knock Jokes

To show how N-grams work, consider Joke18 and Joke21 as a training corpus. To

simplify the example, the characters “.”, “!”, and “?” are replaced with “‡” tag. Quotes,

commas, and colons are dropped. The corpus becomes:

A newspaper reporter goes around the world with his her investigation ‡

He she stops people on the street and asks them Excuse me Sir Madam

what is your opinion of the meat shortage ‡ An American asks What is

shortage ‡ The Russian asks What is opinion ‡ The Pole asks What is

meat ‡ The New York taxi-driver asks What is excuse me ‡ A woman

goes to the rabbi Rabbi what shall I do so that I wouldn’t become pregnant

again ‡ The rabbi says Drink a glass of cold water ‡ Before or after ‡ The

rabbi replies Instead ‡

Suppose, the word with the highest probability that can follow the word “what” should be

found. To find this word, bigrams can be used.

After examining the test corpus, following sequences are discovered: “what is,” “what

is,” “what is,” “what is,” “what is,” “what shall.” This means that “what is” occurs 5

times, “what shall” occurs once, and “what” occurs 6 times. Plugging it into the formula

for p(A|B), the probabilities are: p(is|what) = 5/6; p(shall|what) = 1/6. Therefore, using

the corpus of Joke18 and Joke21, the word “is” should follow the word “what”.

48

Page 54: Knock Knock Jokes

If the above example only used the Joke18 as its training corpus, the answer, of course,

would be different as “what is” would not be present in the corpus. This shows that the

choice of a training corpus is crucial to the results of experiments.

3.2 Distant N-grams

The N-gram model is used to predict a word based on the preceding N-1 words because

most of the relevant syntactic information can be expected to lie in the immediate past.

However, some relevant words may be within a greater distance than a regular N-gram

model covers [Huang, 1993]. “Distant or skip N-Grams are used to cover long-range

dependencies with N-Gram models with a small N. This is done by introducing a gap of a

certain length between a word and its history” [Brown, 2001].

Consider a word sequence from Joke4: “doctor’s young and pretty wife.” Unlike regular

N-grams, distant N-grams with a gap of two will be able to capture not only “pretty wife”

dependency, but also “young wife.”

To illustrate how distant N-grams work, consider the training corpus of Joke18 and Joke21.

The task is to find the most probable word after the word “what,” as it was done with

regular N-grams. This time distant bigram with a gap of one will be used.

In addition to “what is” and “what shall,” the following sequences will be taken into

account: “what your,” “what shortage,” “what opinion,” “what meat,” “what excuse,”

49

Page 55: Knock Knock Jokes

“what I.” This means that p(is|what) = 5/12, p(shall|what) = 1/12, p(your|what) = 1/12,

p(shortage|what) = 1/12, p(opinion|what) = 1/12, p(meat|what) = 1/12, p(excuse|what) =

1/12, p(I|what) = 1/12. In this example, the result stays the same: the word “is” should

follow “what.” However, it may not always be the case.

4 Possible Methods for Joke Recognition

It is shown in the Section 2.4 that there are very few approaches for computational humor

generation, and almost none for computational recognition of humor. This may be partly

due to the absence of a well-described algorithm.

4.1 Simple Statistical Method

One approach to computation joke recognition is building a simple statistical joke

recognizer. Using Suls’ two-stage model [Suls, 1972], a joke recognizer may be able to

tell if texts are jokes by checking for the conflict with prediction. To check if there is a

conflict with the prediction, N-grams can be used. For example, in Joke29, the expected

reply is the room number, not “the lobby.” The N-grams may be able to predict the room

number, whereas “the lobby” may have a low probability of occurrence.

Joke29: “Hotel clerk: Welcome to our hotel.

Max: Thank you. Can you tell me what room I’m in?

50

Page 56: Knock Knock Jokes

Clerk: The lobby.”††

The first step is to determine which N will give most accurate result. This can be done by

inputting a sample of jokes into a learner. Once N is determined, the N-gram model can

be used to check the probability of the next word or the next phase. If the probability is

low enough, assume that there is a conflict with prediction, and call this text a joke.

4.2 Punchline Detector

A second possible approach is punchline detector. Once again, Suls’ [1972] two-stage

model is used. However, while the first approach starts with a text, and concludes if the

text is or is not a joke, this approach starts with a known joke, and finds a punchline in it.

A human expert checks if the punchline is correctly identified. In this case, a punchline

is defined as a sentence that breaks the prediction. Once again, N-gram model can be

used. The punchline of a joke will then be a sentence, containing the utterance with the

lowest probability.

Results can be tested by comparing the output of a punchline recognizer with results from

a program that randomly chooses second or third sentence of a joke to be a punchline. A

recognizer is considered successful if the number of correctly identified punchlines by

recognizer exceeds the number of punchlines drawn correctly in a random sample.

†† Taken from “The Original 365 Jokes, Puns & Riddles Calendar,” 2004

51

Page 57: Knock Knock Jokes

4.3 Restricted Context

For an initial investigation, the first two approaches would be overly broad. To restrict

the problem, a restricted context approach was used, and the limited domain of “Knock

Knock jokes” was examined.

A typical Knock Knock (KK) joke is a dialog between two people that uses wordplay in

the punchline. Recognizing humor in a KK joke arises from recognizing the wordplay.

A KK joke can be summarized using the following structure:

Line1: “Knock, Knock”

Line2: “Who is there?”

Line3: any phrase

Line4: Line3 followed by “who?”

Line5: One or several sentences containing one of the following:

Type1: Line3

Type2: a wordplay on Line3

Type3: a meaningful response to Line3.

Joke30 is an example of Type1, Joke31 is an example of Type2, and Joke32 is an example of

Type3.

Joke30: Knock, Knock

52

Page 58: Knock Knock Jokes

Who is there?

Water

Water who?

Water you doing tonight?

Joke31: Knock, Knock

Who is there?

Ashley

Ashley who?

Actually, I don’t know.

Joke32: Knock, Knock

Who is there?

Tank

Tank who?

You are welcome.‡‡

From theoretical points of view, both Raskin’s [1985] and Suls’ [1972] approaches can

explain why Joke30 is a joke. From Raskin’s approach, the explanation is: the two belong

to different scripts that overlap in the phonetic representation of “water,” but also oppose

each other. From Suls’ approach, the explanation is: “what are” conflicts with the

prediction. In this approach, a cognitive rule can be described as a function that finds a

‡‡ http://www.azkidsnet.com/JSknockjoke.htm

53

Page 59: Knock Knock Jokes

phrase that is similar in sound to the word “water,” and that fits correctly in beginning of

the final sentence’s structure. This phrase is “what are” for Joke30.

5 Experimental Design

A further tightening of the focus was to attempt to recognize only one type of KK jokes.

In this case, not all Knock Knock jokes were expected to be recognized. The program

was only expected to recognize Type1 jokes. This means that it did not recognize a joke

unless Line5 contained Line3 when the jokes was read, and a wordplay of Line3 made

sense in Line5. In the context of this program, a wordplay is defined as a meaningful

phrase that sounds similar to the original phrase, but has different spelling. The original

phrase, in this case Line3, is referred to as the keyword. For example, only Joke30 from

the set of KK jokes in the Section 4.3 was expected to be recognized.

There are at least three ways of determining “sound alike” short utterances:

• Use a dictionary of “sound alike” utterances

• Dynamically access a pronouncing dictionary such as The American Heritage

dictionary and search it

• Computationally build up “sounds like” utterances as needed

The only feasible method for this project was building up utterances as needed.

54

Page 60: Knock Knock Jokes

The joke recognition process has four steps:

Step1: joke format validation

Step2: generation of wordplay sequences

Step3: wordplay sequence validation

Step4: last sentence validation

Once Step1 is completed, the wordplay generator generates utterances, similar in

pronunciation to Line3. Step3 only checks if the wordplay makes sense without touching

the rest of the punchline. It uses a bigram table for validation. Only meaningful

wordplays are passed to Step4 from Step3.

If the wordplay is not in the end of the punchline, Step4 takes the last two words of the

wordplay, and checks if they make sense with the first two words of text following the

wordplay in the punchline, using two trigram sequences. If the wordplay occurs in the

end of the sentence, the last two words before the wordplay and the first two words of the

wordplay are used for joke validation. If Step4 fails, go back to Step3 or Step2, and

continue the search for another meaningful wordplay.

It is possible that the first three steps return valid results, but Step4 fails; in which case a

joke is not considered a joke.

55

Page 61: Knock Knock Jokes

The punchline recognizer is designed so that it does not have to validate the grammatical

structure of the punchline. Moreover, it is assumed that the Line5 is meaningful when the

expected wordplay is found, if it is a joke; and, that Line5 is meaningful as is, if the text is

not a joke. In other words, a human expert should be able to either find a wordplay so that

the last sentence makes sense, or conclude that the last sentence is meaningful without

any wordplay. It is assumed that the last sentence is not a combination of words without

any meaning.

The joke recognizer is to be trained on a number of jokes; and, tested on jokes, twice the

number of training jokes. The jokes in the test set are previously “unseen” by the

computer. This means that any joke, identical to the joke in the set of training jokes, is

not included in the test set.

6 Generation of Wordplay Sequences

Given a spoken utterance A, it is possible to find an utterance B that is similar in

pronunciation by changing letters from A to form B. Sometimes, the corresponding

utterances have different meanings. Sometimes, in some contexts, the differing meanings

might be humorous if the words were interchanged.

A repetitive replacement process is used for generation of wordplay sequences. Suppose,

a letter a1 from A is replaced with b1 to form B. For example, in Joke33 if a letter “i” in a

word “kip” is replaced with “ee,” the new word, “keep,” sounds similar to “kip”.

56

Page 62: Knock Knock Jokes

Joke33:

--Knock, Knock!

--Who’s there?

--Kip

--Kip who?

--Kip your hands off me.§§

A table, containing combinations of letters that sound similar in some words, and their

similarity value was used. In this paper, this table will be referred to as the Similarity

Table. Table2 is an example of the Similarity Table. It contains a subset of entries from it.

The Similarity Table was derived from a table developed by Frisch [1996]. Frisch’s

table, shown in Appendix E, contained cross-referenced English consonant pairs along

with a similarity of the pairs based on the natural classes model. Frisch’s table was

heuristically modified and extended to the Similarity Table by “translating” phonemes to

letters, and adding pairs of vowels that are close in sound. Other phonemes, translated to

combinations of letters, were added to the table as needed to recognize wordplay from a

set of training jokes, shown in Appendix B.

The resulting Similarity Table approximately shows the similarity of sounds between

different letters or between letters and combination of letters. A heuristic metric

indicating how closely they sound to each other was either taken from Frisch’s table or

assigned a value close to the average of Frisch’s similarity values. The purpose of the §§ http://www.azkidsnet.com/JSknockjoke.htm

57

Page 63: Knock Knock Jokes

Similarity Table is to help computationally develop “sound alike” utterances that have

different meanings.*** The Similarity Table should be taken as a collection of heuristic

satisficing values that might be refined through additional iteration.

a e 0.23 a o 0.23 e a 0.23 e i 0.23 e o 0.23 en e 0.23 k sh 0.11 l r 0.56 r m 0.44 r re 0.23 t d 0.39 t th 0.32 t z 0.17 w m 0.44 w r 0.42 w wh 0.23

Table2: Subset of entries of the Similarity Table, showing similarity of sounds in words between

different letters

When an utterance A is “read” by the wordplay generator, each letter in A is replaced

with the corresponding replacement letter from the Similarity Table. Each new string is

assigned its similarity with the original word A.

All new words are inserted into a heap, ordered according to their similarity value,

greatest on top. When only one letter in a word is replaced, its similarity value is being

*** The complete table can be seen at homepages.uc.edu/~slobodjm/thesis/sim.table.pdf

58

Page 64: Knock Knock Jokes

taken from the Similarity Table. The similarity value of the strings is calculated using the

following heuristic formula:

similarity of string = number of unchanged letters +

sum of similarities of each replaced entry from the table

Note, that the similarity values of letters are taken from the Similarity table. These values

differ from the similarity values of strings.

Once all possible one-letter replacement strings are found, and inserted into the heap,

according to the string similarity, the first step is complete.

The next step is to remove the top element of the heap. This element has the highest

similarity with the original word. If this element can be decomposed into an utterance

that makes sense, this step is complete. If the element cannot be decomposed, each letter

of the string, except for the letter that was replaced originally, is being replaced again.

Once all possible replacements of a second letter are done, and all newly constructed

strings are inserted according to their similarity, the top element is removed. Just as in

step one, if the string from the top element can be decomposed into a meaningful phrase,

the step is complete. If it cannot, unchanged letter of the top element are replaced. If all

letters have been already replaced the next top element is removed.

59

Page 65: Knock Knock Jokes

Consider Joke30 as example. The joke fits a typical KK joke pattern. The next step is to

generate utterances similar in pronunciation to “water.”

Table3 shows some of the strings received after one-letter replacements of “water” in

Joke30. The second column shows the similarity of the string in the first table with the

original word.

New String String Similarity to “Water” watel 4.56 mater 4.44 watem 4.44 rater 4.42

wader 4.39 wather 4.32 watar 4.23 watir 4.23 wator 4.23 weter 4.23 whater 4.23 woter 4.23 wazer 4.17

… … Table3: Examples of strings received after replacing one letter from the word “water” and their

similarity value to “water”

Suppose, the top element of the heap is “watel,” with the similarity value of 4.56. Watel

cannot be decomposed into a meaningful utterance. This means that each letter of “watel”

except for “l” will be replace again. The newly formed strings will be inserted into the

heap, in the order of their similarity value. The letter “l” will not be replaced as it not the

“original” letter from “water.” The string similarity of newly constructed strings will be

most likely less than 4. (The only way a similarity of a newly constructed string is

60

Page 66: Knock Knock Jokes

greater than 4 is if the similarity of the replaced letter is above 0.44, which is unlikely.)

This means that they will be placed below “wazer”. The next top string, “mater,” is

removed. “Mater” is a word. However, it does not work in the sentence “Mater you

doing.” (See Sections 7 and 8 for further discussion.) The process continues until

“whater” is the top string. The replacement of “e” in “whater” with “a” will result in

“whatar”. Eventually, “whatar” will become the top string, at which point “r” will be

replaced with “re” to produce “whatare”. “Whatare” can be decomposed into “what are”

by inserting a space between “t” and “a”. The next step will be to check if “what are” is

a valid word sequence.

Generated wordplays that were successfully recognized by the wordplay recognizer, and

their corresponding keywords are stored for the future use of the program. When the

wordplay generator receives a new request, it first checks if wordplays have been

previously found for the requested keyword. The new wordplays will be generated only

if there is no wordplay match for the requested keyword, or the already found wordplays

do not make sense in the new joke.

7 Wordplay Recognition

A wordplay sequence is generated by replacing letters in the keyword (as defined in

Section 4.3). The keyword is examined because: if there is a joke, based on wordplay, a

phrase that the wordplay is based on will be found in Line3. Line3 is the keyword. A

wordplay generator generates a string that is similar in pronunciation to the keyword.

61

Page 67: Knock Knock Jokes

This string, however, may contain real words that do not make sense together. A

wordplay recognizer determines if the output of the wordplay generator is meaningful.

A database with the bigram table is used to contain every discovered two-word sequence

along with the number of their occurrences, also referred to as count. Any sequence of

two words will be referred to as word-pair. Another table in the database, trigram table,

contains each three-word sequence, and the count.

The wordplay recognizer queries the bigram table. The joke recognizer, discussed in the

Section 8, Joke Recognition, queries the trigram table.

To construct the database several focused large texts were used. The focus was at the

core of the training process. Each selected text contained a wordplay on the keyword

(Line3) and two words from the punchline that follow the keyword from at least one joke

from the set of training jokes. If more than one text containing a given wordplay was

found, the text with the closest overall meaning to the punchline was selected. Arbitrary

texts were not used, as they did not contain a desired combination of wordplay and part

of punchline.

The bigram table was constructed such that every pair of words occurring in the selected

text was entered into the table. If the table did not contain the newly input pair, it was

inserted with “1” for the count. If the table already contained the pair, the count was

incremented.

62

Page 68: Knock Knock Jokes

The concept of this wordplay recognizer is similar to an N-gram. An N-gram is a model

where the next word depends on a number of previous words. For a wordplay

recognizer, the bigram model is used (N-gram with N equals two).

The output from the wordplay generator was used as input for the wordplay recognizer.

An utterance produced by the wordplay generator is decomposed into a string of words.

Each word, together with the following word, is checked against the database. Suppose,

a wordplay to be checked is an m-word string. The first two words, w1 and w2 will be

checked first. If the sequence of <w1, w2> occurs in the database, the second and third

words are checked. The wordplay is invalid if <wi-1, wi> does not occur in the database

for any 0 < i < m in the string with word count m.

An N-gram determines for each string the probability of that string in relation to all other

strings of the same length. As a text is examined, the probability of the next word is

calculated. The wordplay recognizer keeps the number of occurrences of word sequence,

which can be used to calculate the probability. A sequence of words is considered valid

if there is at least one occurrence of the sequence anywhere in the text. The count and the

probability are used if there is more than possible wordplay. In this case, the wordplay

with the highest probability will be considered first.

For example, in Joke30 “what are” is a valid combination if “are” occurs immediately

after “what” somewhere in the text.

63

Page 69: Knock Knock Jokes

8 Joke Recognition

A text with valid wordplay is not a joke if the rest of the punchline does not make sense.

For example, if the punchline of Joke30 is replaced with “Water a text with valid

wordplay,” the resulting text is not a joke, even though the wordplay is still valid.

Therefore, there has to be a mechanism that can validate that the found wordplay is

“compatible” with the rest of the punchline and makes it a meaningful sentence.

A concept similar to a trigram was used to validate the last sentence. A trigram is an N-

gram with N equals three. All three-word sequences are stored in the trigram table.

The same training set was used for both the wordplay and joke recognizers. The

difference between the wordplay recognizer and joke recognizer was that the wordplay

recognizer used pairs of words for its validation while the joke recognizer used three

words at a time. As the training text was read, the newly read word and the two

following words were inserted into the trigram table. If the newly read combination was

in the table already, the count was incremented.

As the wordplay recognizer had already determined that the wordplay sequences existed,

there was no reason to revalidate the wordplay. The wordplay could occur in the

beginning of the punchline, in the middle of the punchline, or in the end of the punchline.

64

Page 70: Knock Knock Jokes

Depending on the locations of the wordplay, different steps were taken to validate the

punchline.

8.1 Wordplay in the Beginning of a Punchline

To check if wordplay makes sense in the beginning of a punchline, the last two words of

the wordplay, wwp1 and wwp2, are used, for the wordplay that is at least two words long. If

the punchline is valid, the sequence of wwp1, wwp2, and the first word of the remainder of

the sentence, ws, should be found in the training text. If the sequence <wwp1 wwp2 ws>

occurs in the trigram table, this combination is found in the training set, and the three

words together make sense. If the sequence is not in the table, either the training set is not

accurate, or the wordplay does not make sense in the punchline. In either case, the

computer does not recognize the joke. If the previous check was successful, or if the

wordplay has only one word, the last check can be performed. The last step involves the

last word of the word play, wwp, and the first two words of the remainder of the sentence,

ws1 and ws2. If the sequence <wwp ws1 ws2> occurs in the trigram table, the punchline is

valid, and the wordplay fits with the rest of the final sentences.

For example, Joke30 has a wordplay in the beginning of the punchline. To examine if the

joke is valid, the sequences <what are you> and <are you doing> are checked. If both

sequences occur in the trigram table, the joke is valid. It may seem that both sequences

are common. However, it is possible that no matter how common the sequences are, they

65

Page 71: Knock Knock Jokes

may not be in the text. Then, the joke is considered invalid, regardless of the correctly

identified wordplay.

Suppose, the wordplay recognizer returned the word “waiter” as the wordplay for

“water”. The joke recognizer would examine <waiter you doing> and conclude that it

would not produce a meaningful sentence. In this case, the wordplay recognizer would

have to search for another wordplay.

If the wordplay recognizer found more than one wordplays that “produced” a joke, the

wordplay resulting in the highest trigram sequence probability was used.

8.2 Wordplay at the End of a Punchline

To check if wordplay makes sense at the end of a punchline, the last two words of the

punchline, ws1, ws2, and the first word of the wordplay, wwp, are used. If the sequence

<wwp ws1 ws2> occurs in the trigram table, the combination is valid. If the found

wordplay is at least 2 words, the first two words of the wordplay, wwp1, wwp2, and the last

word of the punchline, ws, are used to determine if the entire punchline makes sense. If

the sequence <wwp1 wwp2 ws> is in the trigram table, this combination was found in the

training set, and the punchline is valid. If either the first or the second sequence is not in

the table, the training text did not have the combination that was needed. The computer

will not recognize the joke.

66

Page 72: Knock Knock Jokes

8.3 Wordplay in the Middle of a Punchline

To check if a punchline with wordplay in the middle is valid, the checks for wordplay in

the end and in the beginning of the punchline are performed. If any of the checks above

fail, the computer does not recognize the joke.

9 Training Text

A collection of texts is used to populate the database with bigram and trigram tables.

As a text is read, every word and the following word, or two following words, are

inserted into the database. The number of words in a sequence depends on whether the

text is used for the wordplay or the punchline recognizer. If the newly read combination

already exists in the database, the count is incremented. If the combination is not in the

database yet, it is entered.

A simple statistical sentence parsing was used. It was decided to ignore the punctuation

marks that normally do not terminate sentences.

9.1 First Approach

67

Page 73: Knock Knock Jokes

If a new word is followed by one of the following non-sentence terminating characters: “

, ” , “ - ”, “ : ” , “ -- ” the word is inserted into the database without the character. If a

word is followed by one of the following terminating characters: “ .” , “ ! ”, “ ? ”, “ ; ”, “

… ” the sentence is considered to be terminated. The next sequence to be inserted will

start from the beginning of the next sentence. For example:

A B, C D E. F G H I!

The punchline recognizer uses three-word sequences. The above sentences will result in

the following database insertions for the use by punchline recognizer:

(A B C)

(B C D)

(C D E)

(F G H)

(G H I)

9.2 Second Approach

An alternative to the above is the following insertions:

(A B “,”)

68

Page 74: Knock Knock Jokes

(B “,” C)

(C D E)

(D E “.”)

(E “.” F)

(F G H)

(G H I)

(H I “!”)

The difference between the first and the second approaches is in entering punctuation

marks into the database. The first does not enter any punctuation marks into the

database: it ignores them in the middle of the sentence and stops when the end of the

sentence is reached. The drawback of the first approach is the absence of a relationship

between E and F. The second approach enters punctuation marks into the database as

valid “words,” if it occurs in the second or third position. While it catches the end of a

sentence, and in some cases intonation (when “!” is entered), it does not enter some of the

combinations, entered by the first model. (B C D) is an example of the combination not

inserted into the database by the second approach.

9.3 Third Approach

A third alternative is the first and the second combined. It contains the largest number of

entries to the database:

(A B C)

69

Page 75: Knock Knock Jokes

(A B “,”)

(B C D)

(B “,” C)

(C D E)

(D E “.”)

(E “.” F)

(F G H)

(G H I)

(H I “!”)

As commas are sometimes optional, they can be dropped, resulting in the following

insertion:

(A B C)

(B C D)

(C D E)

(D E “.”)

(E “.” F)

(F G H)

(G H I)

(H I “!”)

The number of combinations makes it more possible to identify valid word combinations,

but it also slows down the program.

70

Page 76: Knock Knock Jokes

9.4 Fourth Approach

Entering punctuation marks into the database has another drawback. Sometimes,

different texts used for the bigram and trigram tables generation use do not follow

standard punctuation rules. Punctuation rules are not necessarily followed in Knock

Knock jokes. A sequence “P Q. R S” in one joke or text may become “P Q; R S” in

another joke or text. For this reason, the fourth approach is used. This approach replaces

all punctuation marks with an arbitrary chosen symbol “x”. It also removes entries with

“x” in the middle of a sequence, as it shows dependence that may not hold. There may be

sentences Y, Z in a text T that are interchangeable. This means that the relative position

of Y and Z will not change the meaning of T. Adding (Ylast word “x” Xfirst word) does not

reflect the actual dependency just as adding (Xlast word “x” Yfirst word)

After removing sequences with “x” in the second position, the fourth approach results in

the following insertions:

(A B C)

(B C D)

(C D E)

(D E “x”)

(F G H)

(G H I)

(H I “x”)

71

Page 77: Knock Knock Jokes

The joke recognizer uses the fourth approach for N-gram database, as it seems to be the

most flexible and the most reasonable out of the four discussed.

9.5 Fifth Approach

The fifth approach is to use distant N-grams. This approach was used only for the distant

N-gram database. The distant N-grams are stored in a different database, which will also

have bigram and trigram tables. The distant N-grams used in this program have a gap of

two. This means that as the text is read, the sequences are inserted into the database with

a possible distance of up to two positions between words of the sequence in the sentence.

For example, if the sentence “J K, L M.” is read, the following will be inserted into the

database.

(J K L)

(J K M)

(J K “x”)

(J L M)

(J L “x”)

(J M “x”)

In addition, if one word from one of the groups below is part of the sequence to be

inserted into any of the four tables (for distant and regular N-grams), the sequences with

all “members of the group” will be inserted. The groups are:

72

Page 78: Knock Knock Jokes

• is, was, are, were, will, shall

• do, did, shall, will

• I, she, he, it, we, you, they

• my, his, her, its, your, our, their

• a, an, the

For example, if the sequence (N M a) is read from the text, the sequences (N M a), (N M

an), (N M the) will be inserted into the database.

10 Experimentation and Analysis

10.1 Training Set

A set of 65 jokes from the “111 Knock Knock Jokes” website††† and one joke taken from

“The Original 365 Jokes, Puns & Riddles Calendar” was used as a training set. (See

Appendix B: Training Jokes) The Similarity Table, discussed in the Section 6,

Generation of Wordplay Sequences, was modified with new entries until correct

wordplay sequences could be generated for all 66 jokes. The training texts inserted into

the bigram and trigram tables were chosen based on the punchlines of jokes from the set

of training jokes. The training texts were inserted into the database according to the rules

††† http://www.azkidsnet.com/JSknockjoke.htm

73

Page 79: Knock Knock Jokes

described in the Section 9, Training Text. The jokes were run against the standard N-

grams and distant N-grams with the gap of two.

The length of the keyword in the training jokes was calculated by taking the longest

keyword in the sample, not counting the white space character or punctuation marks. The

resulting maximum length of the keywords in the training set was eight characters.

The results are described in Table4:

Regular N-grams Distant N-grams Jokes Number Percentage Number Percentage Recognized Jokes 59 89.39% 59 89.39%

Unrecognized Jokes 7 10.61% 7 10.61%

Table4: Training Jokes results

While Table4 contains results for both regular and distant N-grams, it was unnecessary to

run the later. The distant N-grams are usually used to find two or three-word sequences

that may be meaningful, but that have words that are not adjacent to each other in the

text. This method makes sense for the set of test jokes, as the joke texts are unknown

ahead of time. The text of training jokes, on the other hand, was known. Moreover, the

texts, used to populate the database, were chosen so they did contain sequence of the

words in the first part of punchlines in all jokes. This means that the distant N-grams

will not recognize more jokes if the expected wordplay was found.

74

Page 80: Knock Knock Jokes

The program was unable to recognize KKjoke9 from Appendix B: Training Jokes as there

was no text in the database to recognize “bugs pray” sequence. This is the only joke from

the training set that did not have a punchline sequence inserted into the database. The

sequence was not inserted because a search on “bugs pray that” sequence did not return

any texts that were non-jokes. However, when “bugs pray that snakes” was inserted into

the database (and removed immediately after), the program recognized the joke.

The other six jokes, shown in Table5, were not recognized, as the wordplay recognizer

was not able to find wordplays based on their keywords. The possible cause is the length

of the Line3 combined with the number of replacements required to find wordplays for

these jokes. The first column of Table5 reflects the joke’s number from Appendix B:

Training Jokes that was not recognized. The second column shows the length of the

keyword. The third column shows the number of letter replacements needed to find a

wordplay. (See Generation of Wordplay Sequences for explanation of letter

replacements.)

Joke Number Length of Line3 Number of Replacements for Wordplay 17 6 3 23 6 3 28 7 5 51 6 5 57 7 4 66 6 4

Table5: Unrecognized jokes in the training set

If all jokes in Table5 were to be recognized, the Similarity Table would have to be

modified so that each wordplay would require at most two replacements of letters in the

75

Page 81: Knock Knock Jokes

keyword. If such modifications were made, the joke recognizer would recognize 65

jokes from the training set.

10.2 Alternative Training Set Data Test

A sample of training jokes was also run using Hempelmann’s [2003b] cost table. The

table, shown in Appendix F, reflects the cost of going from a letter in first column of the

table to a letter in the first row. Hempelmann’s cost table was modified to include all

combinations of letters used in the Similarity Table. Heuristically, the cost values of the

newly inserted pairs were adjusted to the average of the cost table values.

The sample set consisted of the first eleven jokes from the Training Set. Using the cost

table, eight jokes of the sample set were recognized as jokes, while using the Similarity

Table ten jokes were recognized. Both approached did not recognize “bugs pray” joke as

the needed word sequence is not in the database.

10.3 General Joke Testing

The program was run against a test set of 130 KK jokes, and a set of 65 non-jokes that

have a similar structure to the KK jokes. The non-jokes are discussed in the Section 10.4.

76

Page 82: Knock Knock Jokes

The test jokes were taken from “3650 Jokes, Puns & Riddles” [Kostick, 1998]. These

jokes had the punchlines corresponding to any of the three KK joke structures discussed

in Section 4.3. Namely,

Type1: Punchlines containing Line3 -- To recognize the joke, Line3 will have

to be substituted with its wordplay in the punchline.

Type2: Punchline containing wordplay on Line3 -- To recognize the joke, the

generated wordplay sequence will have to match the punchline version of

the wordplay.

Type3: Punchline containing a meaningful response to Line3 -- The program is

not expected to recognize these jokes.

To test if the program finds the expected wordplay, each joke had an additional line,

Line6, added after Line5. Line6 is not a part of any joke. It only existed so that the

wordplay found by the joke recognizer could be compared against the expected

wordplay. Line6 consists of the punchline with the expected wordplay instead of the

punchline with Line3.

The results are categorized as follows:

• Jokes identified as jokes

o found wordplay matches the expected wordplay

o found wordplay does not match the expected wordplay

77

Page 83: Knock Knock Jokes

the punchline is meaningful

the punchline does not make sense

• Jokes identified as non-jokes

o The structure of the joke is unrecognizable by the program

o The structure of the joke should have been recognized

correct wordplay found, but the punchline was not recognized

correct wordplay not found

As the training set contained only jokes where the length of Line3 did not exceed eight

characters, the test set only had jokes with Line3 of at most eight characters long.

As it was discussed in the Section 5, the jokes in the test set were previously “unseen” by

the computer. This means that if the book contained a joke, identical to the joke in the set

of training jokes, this joke was not included in the test set.

Some jokes, however, were very similar to the jokes in the training set, but not identical.

These jokes were included in the test set, as they were not the same. As it turned out,

some jokes to a human may look very similar to jokes in the training set, but treated as

completely different jokes by the computer.

An example of this is KKjoke63 from the training set and KKjoke60 from the training set.

Both jokes are based on the line “She’ll be coming round the mountain.” The joke in the

training test has “Shelby” as keyword, while the joke from the test set has “Sheila.” The

78

Page 84: Knock Knock Jokes

joke recognizer treats the two jokes differently as the keywords differ. Another example

is KKjoke99 from the test jokes and KKjoke53 from the training jokes. Both jokes are

based on the keyword “cargo”, both jokes as “car go” as their wordplay. But, the

punchline of one is “Car go beep, beep”, and the punchline of the other is “Car go hong,

hong.” Once again, the jokes are treated as different jokes by the computer as is has to

find different trigram sequence for them.

The set of test jokes was run using “regular” N-grams and distant N-grams. The results

are shown in Table6:

Normal N-grams Distant N-grams Jokes Identified As Number % Number %

Expected wordplay found 12 9.23 16 12.31 Punchline is meaningful 2 1.54 2 1.54

Jokes Unexpected wordplay found Punchline is not meaningful 3 2.31 10 7.69 Wrong structure 8 6.15 8 6.15

Wordplay found 68 52.31 62 47.69 Non Jokes Correct structure

No wordplay 37 28.46 32 24.62

Table6: Results of the Joke Test Set

Table6 shows that out of 130 previously unseen jokes the program was not expected to

recognize eight jokes. These jokes are not expected to be recognized because the

program is not expected to recognize their structure.

10.3.1 Jokes in the Test Set with Wordplay in the Beginning of Punchline

79

Page 85: Knock Knock Jokes

Using regular N-grams, the program recognized only seventeen jokes as jokes out of 122

that it could potentially recognize. Twelve of these jokes have the punchlines that

matched the expected punchlines. Two jokes have meaningful punchlines that were not

expected. Three jokes were identified as jokes by the computer, but their punchlines do

not make sense to the investigator.

When the program used distant N-grams, twenty-eight jokes were recognized. In the

sixteen jokes the expected punchline was found. Note that regular N-grams recognized

only 75% of the jokes with the expected punchline that were found by the distant N-

grams. The recognized jokes with the meaningful unexpected punchline found by distant

N-grams match the jokes found by regular N-grams. Ten jokes were recognized as jokes

by distant N-grams while their punchlines did not make sense to an investigator.

One of the incorrectly identified jokes is KKjoke108. The joke is based on keyword

“Oscar” and the expected wordplay “ask her.” The wordplay generator failed to find the

expected wordplay, but it returned “offer” instead. The punchline of the joke should read

“Ask her for a date.” However, the joke recognizer uses only two words that follow the

keyword for its validation. The two words are “for a.” The sequence <offer for a> was

found in the trigram table; therefore, the joke was identified as joke. Not only was the

sequence was found in the table, it is actually a meaningful sequence. Without the word

date, it would be difficult for a human to decided between “ask her for a” and “offer for

a.” This suggests that the trigram model does not carry enough information for the task.

80

Page 86: Knock Knock Jokes

The program was able to find wordplay in 85 jokes using regular N-grams, and 90 jokes

using distant N-grams. Some of the jokes with found wordplay were not recognized as

jokes because the database did not contain the needed sequences. When a wordplay was

found, but the needed sequences were not in the database, the program did not recognize

the jokes as jokes, as it was indicated in the Section 5. For example, the wordplay,

“when he,” was found in KKjoke1, based on the keyword “Winnie.” However, the

sequences <when he finally> and <he finally shows> were not in the trigram table of

either regular N-gram or distant N-gram. Therefore, this joke was not recognized as

joke, while the expected wordplay was wound.

In many cases, the found wordplay matched the intended wordplay. This suggests that

the rate of successful joke recognition would be much higher if the database contained all

the needed word sequences.

10.3.2 Jokes in the Test Set with Wordplay in the Middle of a Punchline

Out of 130 jokes used for testing only one joke had a wordplay in the middle of the

punchline. The joke generator was unable to recognize this joke; as the wordplay

recognizer did not find an acceptable wordplay. The joke is based on the wordplay

between “Watson” and “What’s new.”

If the entry <on, new, 0.23> was inserted into the Similarity Table, the wordplay

generator and the wordplay recognizer would be able to generate and recognize the

81

Page 87: Knock Knock Jokes

expected wordplay. If sequences needed for joke recognition, but missing in the trigram

table, were inserted, the joke recognizer would be able to recognize this joke as joke.

10.4 Testing Non-Jokes

The program was run with 65 non-jokes. The only difference between jokes and non-

jokes was the punchline. The punchlines of non-jokes were intended to make sense with

Line3, but not with the wordplay of Line3. The non-jokes were generated from the

training joke set. The punchline in each joke was substituted with a meaningful sentence

that starts with Line3.

If the keyword was a name, the rest of the sentence was be taken from the texts in the

training set. For example, Joke34 became Text1 by replacing “time for dinner” with

“awoke in the middle of the night.”

Joke34: Knock, Knock

Who is there?

Justin

Justin who?

Justin time for dinner.

Text1: Knock, Knock

Who is there?

82

Page 88: Knock Knock Jokes

Justin

Justin who?

Justin awoke in the middle if the night.

A segment “awoke in the middle of the night” was taken from one of the training texts

that was inserted into the bigram and trigram tables. Part of the text is:

“Eric awoke in the middle of the night. He had been feeling increasingly

better every day and had even started walking around and doing little

chores. His broken ribs still bothered him a bit, but his burns had basically

reduced themselves to faint pink scars.‡‡‡”

The name “Eric” in the first sentence was replaced by “Justin,” producing Line5 of Text1.

By using a sentence that was inserted into the database, the possibility of a text being

recognized as non-joke is decreased because a word sequence is unknown to the trigram.

The non-jokes results were categorized as follows:

• Non-jokes identified as non-jokes

• Non-jokes identified as jokes

o punchline with wordplay is meaningful

o punchline does not make sense with wordplay

‡‡‡ http://tisa.stdragon.com/chapters10.htm

83

Page 89: Knock Knock Jokes

The non-joke set was run using N-grams and distant N-grams. The results are shown in

Table7:

Regular N-grams Distant N-grams Text Recognized As Number % Number % Non-Jokes 62 93.94% 58 87.88%

Punchline makes sense with wordplay 1 1.52% 3 4.55% Jokes Punchline does not make sense with wordplay 3 4.55% 5 7.58%

Table7: Non-joke results

The non-joke texts incorrectly identified as jokes, using regular N-grams, were based on

jokes number 20, 39, 49, 50 from Appendix B: Training Jokes.

The text based on KKjoke39 was identified as a joke, and its punchline made sense with

the found wordplay. However, the wordplay did not sound similar to the Line3. Line3 of

KKjoke39 is “Ken,” and the found wordplay was “she.” This happened because of two

entries in the Similarity Table, used during wordplay generation. Recall that the

Similarity Table has three columns: (letters to be replaced, letters to be replaced with,

similarity between them). The Similarity Table contains these entries: <k, sh, 0.11> and

<en, e, 0.23>. (See Table2.) In some cases, the replacement of “k” with “sh” results in

valid utterances that sound similar to the original one. This is the reason why <k, sh,

0.11> was entered to the Similarity Table. For the same reason, <en, n, 0.23> is in the

Similarity Table. If one the two entries is removed from the Similarity table, the

wordplay generator would not generate “she” as a wordplay on “Ken;” and, the text

based on KKjoke39 may not be recognized as joke with this wordplay. It should be noted,

that then it may be recognized as joke with a different wordplay.

84

Page 90: Knock Knock Jokes

The text based on KKjoke20 was incorrectly identified as a joke. In this case, the

wordplay, “said I” was correctly found as it can be argued that it sounds similar to Line3,

“Sadie.” However, the punchline: “Said I did not want to hear anything” makes no sense.

The program counted the punchline as valid as it was able to find the sequences “said I

did” and “I did not” in the database. Removing at least one of the sequences from the

database would result in the text not identified as joke using the wordplay “said I.”

Notice, that “I said I did not want to hear anything” does make sense. However, if one or

both sequences were removed, the program would not be able to tell that “I said I did not

want to hear anything” is a meaningful sentence.

The text based on KKjoke50 was identified as joke by the joke recognizer as well.

However, the wordplay, found by the program, does not sound similar to Line3; and

punchline does not make sense with the found wordplay. The reasons for incorrectly

identifying this text are similar to the ones discussed in the two preceding paragraphs.

It can be argued that the text based on KKjoke49 is a joke with a meaningful punchline

and correctly found wordplay. Line3 of KKjoke49 is “Amanda,” and the wordplay found

by the program is “a man to.” This results in the following punchline: “A man to put her

hand up to shield her eyes.” While the sentence is meaningful, it is not of a typical form

for a KK joke punchline.

In addition to the texts that were identified as jokes using regular N-grams, four texts

were identified as jokes using distant N-grams. The texts were based on the jokes with

85

Page 91: Knock Knock Jokes

the following words in Line3: “Wade,” “Winnie,” “Ammonia,” and “Europe.” It can be

argued that two of the texts identified as joke (“Wade” and “Winnie”) have meaningful

last sentences. As in the case of KKjoke49, while the sentences are meaningful, they are

not of a typical format for a KK joke punchline.

11 Summary

Computational work in natural language has a long history. Areas of interest have

included: translation, understanding, database queries, summarization, indexing, and

retrieval. There has been very limited success in achieving true computational

understanding.

A focused area within natural language is verbally expressed humor. Some work has

been achieved in computational generation of humor. Little has been accomplished in

understanding. There are many linguistic descriptive tools such as formal grammars.

But, so far, there are not robust understanding tools and methodologies. True success

will probably first come in a narrowly focused area.

The KK joke recognizer is a first step towards computational recognitions of jokes. It is

intended to recognize Knock Knock jokes that are based on wordplay. The recognizer’s

theoretical foundation is based on Raskin’s Script-based Semantic Theory of Verbal

Humor that states that each joke is compatible with two scripts that oppose each other.

The Line3 and the wordplay of Line3 are the two scripts. The scripts overlap in

pronunciation, but differ in meaning.

86

Page 92: Knock Knock Jokes

The joke recognition process can be summarized in four steps:

Step1: joke format validation

Step2: generation of wordplay sequences

Step3: wordplay sequence validation

Step4: last sentence validation

The result of KK joke recognizer heavily depends on:

• the choice of appropriate letter-pairs for the Similarity Table

• the selection of training texts.

The KK joke recognizer “learns” from the previously recognized wordplays when it

considers the next joke. Unfortunately, unless the needed (keyword, wordplay) pair is an

exact match with one of the found (keyword, wordplay) pairs, the previously found

wordplays will not be used for the joke. Moreover, if one of the previously recognized

jokes contains (keyword, wordplay) pair that is needed for the new joke, but the two

words that follow or precede the keyword in the punchline differ, the new joke may not

be recognized regardless of how close the new joke and the previously recognized jokes

are.

The program successfully found and recognized wordplay in most of the jokes. It also

successfully recognized texts that are not jokes, but have the format of a KK joke. It was

87

Page 93: Knock Knock Jokes

not successful in recognizing most punchlines in jokes. The failure to recognize

punchline is due to the limited size of texts used to build the trigram table of the N-gram

database.

While the program checks the format of the first four lines of a joke, it assumes that all

jokes that are entered have a grammatically correct punchline, or at least that the

punchline is meaningful. It is unable to discard jokes with a poorly formed punchline. It

may recognize a joke with a poorly formed punchline as a meaningful joke because it at

most checked four words of the punchline that differ from Line3.

12 Possible Extensions

The results suggest that most jokes were not recognized either because the texts entered

did not contain the necessary information for the jokes to work; or because N-grams are

not suitable for true “understanding” of text. One of the simpler experiments may be to

test to see if more jokes are recognized if the databases contain more sequences. This

would require inserting a much larger text into the trigram table. A larger text may

contain more word sequences, which would mean more data for N-grams to recognize

some jokes.

It is possible that no matter how large the inserted texts are, the simple N-grams will not

be able to “understand” jokes. The simple N-grams were used to understand or to

analyze the punchline. Most jokes were not recognized due to failures in sentence

88

Page 94: Knock Knock Jokes

understanding. A more sophisticated tool for analyzing a sentence may be needed to

improving the joke recognizer. Some of the options for the sentence analyzer are:

an N-gram with stemming

a sentence parser.

A simple parser that can recognize, for example, nouns and verbs; and analyze the

sentence based on parts of speech, rather than exact spelling, may significantly improve

the results. On the other hand, giving N-grams the stemming ability would make them

treat, for example, “color” and “colors” as one entity, which may significantly help too.

The wordplay generator produced the desired wordplay in most jokes, but not all. After

the steps are taken to improve the sentence understander, the next improvement should be

a more sophisticated wordplay generator. The existing wordplay generator is unable to

find wordplay that is based word longer than six characters, and requires more that three

substitutions. A better answer to letter substitution is phoneme comparison and

substitution. Using phonemes, the wordplay generator will be able to find matches that

are more accurate.

The joke recognizer may be able to recognize jokes other than KK jokes, if the new jokes

are based on wordplay, and their structure can be defined. However, it is unclear if

recognizing jokes with other structures will be successful with N-grams.

89

Page 95: Knock Knock Jokes

13 Conclusion

An effort was made to computationally understand humor in a tightly focused domain.

The domain was a particular aspect of wordplay. The experimental joke recognizer was

designed to recognize jokes based on wordplay in the focused domain of Knock Knock

jokes.

The investigation needed to do three things:

generate wordplay based on a phrase

recognize meaningful wordplay

recognize wordplay that made sense in the punchline

A statistics based method was used. Strings were acquired from test corpora, counted

and placed into databases.

The joke recognizer was trained on 66 Knock Knock jokes; and tested on 130 Knock

Knock jokes and 66 non-jokes with a structure similar to Knock Knock jokes.

The results show that the program was successful in correctly recognizing non-jokes and

wordplay. However, it was not successful in recognizing most jokes after the wordplay

was correctly found.

90

Page 96: Knock Knock Jokes

In conclusion, the method was reasonably successful in recognizing wordplay. However,

it was less successful in recognizing when an utterance might be valid.

91

Page 97: Knock Knock Jokes

Bibliography

Salvatore Attardo and Victor Raskin [1991] “Script Theory Revis(it)ed: Joke Similarity And Joke Representation Model,” HUMOR: International Journal of Humor Research, 4:3-4, pp. 293-347

Salvatore Attardo and Victor Raskin [1993] “Nonliteralness And Non-bona-fide In

Language: Approaches To Formal And Computational Treatments Of Humour And Irony,” Unpublished paper

Salvatore Attardo [1994] Linguistic Theories of Humor, Mouton de Gruyter, Berlin Salvatore Attardo, Donalee Attardo, Paul Baltes and Marnie Petray [1994] "The Linear

Organization Of Jokes: Statistical Analysis Of Two Thousand Texts," HUMOR: International Journal of Humor Research, 7:1, pp. 27-54

Salvatore Attardo [1997] “The Semantic Foundations Of Cognitive Theories Of Humor,”

HUMOR: International Journal of Humor Research, 10:4, pp. 395-420 Salvatore Attardo [1998] “The Analysis Of Humorous Narratives,” HUMOR:

International Journal of Humor Research, 11:3, pp. 231-260 Salvatore Attardo [2002a] “Cognitive Stylistics Of Humorous Texts,” In Elena Semino

and Jonathan Culpeper (Eds.) Cognitive Stylistics Language And Cognition In Text Analysis, Benjamins, Amsterdam, pp. 231-250

Salvatore Attardo [2002b] “Formalizing Humor Theory,” Proceedings of Twente

Workshop on Language Technology 20, Enschede, University of Twente, The Netherlands, pp. 1-10

Salvatore Attardo, Christian F. Hempelmann, and Sara DiMaio [2002c] “Script

Oppositions And Logical Mechanisms: Modeling Incongruities And Their Resolutions,” HUMOR: International Journal of Humor Research, 15:1, pp. 3-46

Kim Binsted and Graeme Ritchie [1994] “An Implemented Model Of Punning Riddles,”

Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA, pp. 633-638

Kim Binsted [1996a] Machine Humour: An Implemented Model Of Puns, Doctoral

dissertation, University of Edinburgh, Scotland, UK Kim Binsted and Graeme Ritchie [1996b] “Speculations On Story Puns,” Proceedings of

Twente Workshop on Language Technology 12, Enschede, University of Twente, The Netherlands, pp. 151-160

92

Page 98: Knock Knock Jokes

Kim Binsted and Graeme Ritchie [1997] “Computational Rules For Punning Riddles,”

HUMOR: International Journal of Humor Research, 10:1, pp. 25-76 Kim Binsted and Osamu Takizawa [1998] “BOKE: A Japanese Punning Riddle

Generator,” The Journal of the Japanese Society for Artificial Intelligence, 13(6), (in Japanese)

Michael K. Brown, Andreas Kellner, Dave Raggett [2001] “Stochastic Language Models

(N-Gram) Specification,” W3C Working Draft 3, http://www.w3.org/TR/ngram-spec/ David Allen Clark [1968] Jokes, Puns and Riddles, Doubleday, New York Delia Chiaro [1992] The Language Of Jokes: Analyzing Verbal Play, Routledge,

London Paul De Palma and Judith Weiner [1992] “Riddles: Accessibility And Knowledge

Representation,” In Proceedings of the 15th International Conference on Computational Linguistics, pp. 1121–1125

Matt Freedman and Paul Hofman [1980] How Many Zen Buddhists Does It Take To

Screw In A Light Bulb? St Martin’s Press, New York Sigmund Freud [1905] Der Witz Und Seine Beziehung Zum Unbewussten, Durticke,

Leipzig and Vienna, translated by James Strachey and reprented as Jokes And Their Relation To The Unconscious, 1960, W. W. Norton, New York

Stefan Frisch [1996] Similarity And Frequency In Phonology, Doctoral dissertation,

Northwestern University Ken Goldberg, Theresa Roeder, Dhruv Gupta and Chris Perkins [2001] “Eigentaste: A

Constant Time Collaborative Filtering Algorithm,” Information Retrieval Journal, pp. 133-151

Christian F. Hempelmann [2003a] “The Ynperfect Pun Selector for Computational

Humor,” Workshop at CHI 2003, Fort Lauderdale, Florida

Christian F. Hempelmann [2003b] Paronomasic Puns: Target Recoverability Towards Automatic Generation, Doctoral dissertation, Purdue University, Indiana

Robert Hetzron [1991] “On The Structure Of Punchlines,” HUMOR: International

Journal of Humor Research, 4:1, pp. 61–108

Xuedong Huang, Fileno Alleva, Hsiao-Wuen Hon, Mei-Yuh Hwang, Ronald Rosenfeld [1992] “The SPHINX-II Speech Recognition System: An Overview,” Computer Speech and Language, 7:2, pp. 137-148

93

Page 99: Knock Knock Jokes

Daniel Jurafsky, James Martin [2000] Speech and Language Processing, Prentice-Hall,

New Jersey Patricia Keith-Spiegel [1972] “Early Concepts Of humor: Varieties And Issues,” In

Jeffrey H Goldstein and Paul E. McGhee (Eds.) The Psychology Of Humor: Theoretical Perspectives and Empirical Issues, Academic Press, New York and London, pp. 4-39

Anne Kostick, Charles Foxgrover and Michael Pellowski [1998] 3650 Jokes, Puns &

Riddles, Black Dog & Leventhal Publishers, New York Robert Latta [1999] The Basic Humor Process, Mouton de Gruyter, Berlin Greg Lessard and Michael Levison [1992] “Computational Modelling Of Linguistic

Humour: Tom Swifties,” ALLC/ACH Joint Annual Conference, Oxford

Dan Loehr [1996] “An Integration Of A Pun Generator With A Natural Language Robot,” Proceedings of Twente Workshop on Language Technology 12, Enschede, University of Twente, The Netherlands, pp. 161-172

Craig J. McDonough [2001] “Mnemonic String Generator: Software To Aid Memory Of

Random Passwords,” CERIAS Technical report, West Lafayette, IN, 9 pp. Justin McKay [2002] “Generation Of Idiom-based Witticisms To Aid Second Language

Learning,” Proceedings of Twente Workshop on Language Technology 20, Enschede, University of Twente, The Netherlands, pp. 77-88

Dallin D. Oaks [1994] “Creating Structural Ambiguities In Humor: Getting English

Grammar To Cooperate,” HUMOR: International Journal of Humor Research, 7:4, pp. 377–401

Daniel Perlmutter [2000] “Tracing The Origin Of Humor,” HUMOR: International

Journal of Humor Research, 13:4, pp. 457-468 Daniel Perlmutter [2002] “On Incongruities And Logical Inconsistencies In Humor: The

Delicate Balance,” HUMOR: International Journal of Humor Research, 15:2, pp. 155-168

Victor Raskin [1985] The Semantic Mechanisms Of Humour, Reidel, Dordrecht, The

Netherlands Victor Raskin [1996] “Computer Implementation Of The General Theory Of Verbal

Humor,” Proceedings of Twente Workshop on Language Technology 12, Enschede, University of Twente, The Netherlands, pp. 9-19

94

Page 100: Knock Knock Jokes

Victor Raskin [1999] “Laughing At And Laughing With: The Linguistics Of Humor And Humor In Literature,” In Rebecca S. Wheeler (Ed.) The Workings Of Language From Prescriptions To Perspectives, Praeger, Westport, CT

Willibald Ruch, Salvatore Attardo and Victor Raskin [1993] “Towards An Empirical

Verification Of The General Theory Of Verbal Humor,” HUMOR: International Journal of Humor Research, 6:2, pp. 123–136

Willibald Ruch [2001] “The Perception Of Humor,” In A.W. Kaszniak (Ed.), Emotion,

Qualia, And Consciousness, Word Scientific Publisher, Tokyo, pp. 410-425. Graeme Ritchie [1998] “Prospects For Computational Humour,” Proceedings of 7th

IEEE International Workshop on Robot and Human Communication, Takamatsu, Japan, pp. 283-291

Graeme Ritchie [1999] “Developing The Incongruity-Resolution Theory,” Proceedings

of AISB 99 Symposium on Creative Language: Humour and Stories, Edinburgh, Scotland, pp. 78–85

Graeme Ritchie [2000] “Describing Verbally Expressed Humour,” Proceedings of AISB

Symposium on Creative and Cultural Aspects and Applications of AI and Cognitive Science, Birmingham, pp. 71-78

Graeme Ritchie [2001] “Current Directions In Computational Humour,” Artificial

Intelligence Review, 16:2, pp. 119-135 Graeme Ritchie [2003] “The JAPE Riddle Generator: Technical Specification,”

Technical Report, EDI-INF-RR0158 Mary K. Rothbart and Diane Plen [1976] “Elephants And Marshmallows: A Theoretical

Synthesis Of Incongruity-Resolution And Arousal Theories Of Humour,” In Anthony J. Chapman and Hugh C. Foot (Eds.) It's A Funny Thing Humour, Pergamon Press, Oxford; New York

Oliviero Stock and Carlo Strapparava [2002] “Humorous Agent For Humorous

Acronyms: The HAHAcronym Project,” Proceedings of Twente Workshop on Language Technology 20, Enschede, University of Twente, The Netherlands, pp. 125-136

Jerry M. Suls [1972] “A Two-Stage Model For The Appreciation Of Jokes And Cartoons:

An Information-Processing Analysis,” In Jeffrey H. Goldstein and Paul E. McGhee (Eds.) The Psychology Of Humor, Academic Press, New York, pp. 81–100

Jerry M. Suls [1976] “Cognitive And Disparagement Theories Of Humour: A Theoretical

And Empirical Synthesis,” In Anthony J. Chapman and Hugh C. Foot (Eds.) It's A Funny Thing Humour, Pergamon Press, Oxford; New York

95

Page 101: Knock Knock Jokes

Osamu Takizawa, Masuzo Yanagida, Akira Ito, Hitoshi Isahara [1996] “On

Computational Processing Of Rhetorical Expressions - Puns, Ironies And Tautologies,” Proceedings of Twente Workshop on Language Technology 12, Enschede, University of Twente, The Netherlands, pp. 39-52

Thomas C. Veatch [1998] “A Theory Of Humor,” Humor: The International Journal of

Humor Research, 11:2, pp. 161-175 T. Yokogawa [2002] “Japanese Pun Analyzer Using Articulation Similarities,”

Proceedings of FUZZ-IEEE, Honolulu, HI

96

Page 102: Knock Knock Jokes

Appendix A: Training texts

Joke Number Text Used

1 From the “Tropical Belize Tours” http://www.tropicalbelize.com/adv_cayo.asp 2 Jerusalem Quarterly File. Issue 5, 1999. http://www.jqf-

jerusalem.org/1999/jqf5/sarsar.html 3 Mihoko Takahashi Mathis: Exploring Gender Issues in the Foreign Language

Classroom: An Ethnographic Approach http://members.at.infoseek.co.jp/gender_lang_ed/articles/mathis.html http://www.mste.uiuc.edu/courses/ci332fa03/folders/cohort5/eyanaki/math%20lesson%201.htm

4 Letters of Anton Chekhov. http://etext.library.adelaide.edu.au/c/c51lt/chap72.html 5 Ben Myatt “How Can I Go On Without You?”

http://mooseofdoom.freewebpage.org/hgwy_2.htm 6 Case Log: Dr. Kip Redford.Patient: Ralph Fitzpatrick.

http://andrewsfantasy.homestead.com/files/orpheus3.htm 7 Kyle Meador “Mirrors”, In “Reflections of Christ” June 2003

http://reflectionsofchrist.org/Articles/Mirrors%20-%20June%202003.htm 8 Voices Of Youth: Ismael’s Story

http://www.hrwcalifornia.org/Student%20Task%20Force/BeahSTORY.htm 9 Not found 10 http://www.angelfire.com/boybands/nyckyshosted/chrissymoffatts1-5.html 11 Fish Griwkowsky “No pants? Even funnier”

http://www.canoe.ca/JamEdmontonFringe/review99_bushed.html 12 http://www.angelfire.com/oh/Acie/page7.html 13 http://www.guidelines.org/commentaries/mar07_03.asp 14 Theria “Kamikaze Slayer Luna”

http://slayers.aaanime.net/~linazel/fanfics/fanfic.asp?fanfic=kamikaze&part=1 15 Bronte, Emily : Wuthering Heights

http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=25704290&textreg=1&query=I+advised+her+to+value+him+the+more+for+his+affection&id=BroWuth http://www.angelfire.com/oh/Acie/page7.html

16 Tisa’s Online Stories http://tisa.stdragon.com/chapters10.htm 17 Nanaki “The Aftermath Of Mount Woe Chapter 31. The Courtship Of Princess Schala”

www.icybrian.com/fanfic/nanaki/omw31.html 18 http://www.kididdles.com/mouseum/r003.html 19 http://www.tatianart.com/prose_sketches.htm 20 Linda Larsen “Say the Magic Words”

http://www.lindalarsen.com/2003/html/articles/pt61.html 21 SVSY Books http://www.springnet.cc/~cinemakatie/price10.html 22 Tim Wildmon “Call 'Em Like You See 'Em”

http://www.family.org/fofmag/pf/a0026173.cfm 23 Mihoko Takahashi Mathis: Exploring Gender Issues in the Foreign Language

Classroom: An Ethnographic Approach

97

Page 103: Knock Knock Jokes

http://members.at.infoseek.co.jp/gender_lang_ed/articles/mathis.html 24 http://www.tatianart.com/prose_sketches.htm 25 Eating Contests from “Tales Of Old Winster”

http://users2.ev1.net/~earthlings/Tales_of_Old_Winster_2.htm 26 http://www.recmusic.org/lieder/get_text.html?TextId=12516 27 http://www.greenmanreview.com/tradruss.html 28 Karen Bliss “Seinfeld stand-up not just for laughs”

http://www.canoe.ca/TheatreReviewsI/imtellingyouforthelast.html 29 http://www.adequacy.net/reviews/t/texlahoma.shtml 30 http://www.epinions.com/content_103569329796 31 Albany, George : The Queen's Musketeer; or, Thisbe, the Princess Palmist. Dime

Library No. 76, Chapter I http://www.niulib.niu.edu/badndp/albany_george.html 32 Western New York Railroad Archive http://wnyrails.railfan.net/news/c0000037.htm 33 http://www.mystery-pages.de/takethat/ttrev4.htm 34 Mark L. Warner “Locks and Wandering” http://www.ec-

online.net/Knowledge/Articles/wandering1.html Amy Argetsinger “Loss of Innocence On U.S. Campuses” Washington Post February 5, 2001; http://gutches.net/deafhotnews/murder2/washpost_feb5a.pdf

35 Julie Ovenell-Carter “Hurry up and slow down” http://virtual-u.sfu.ca/mediapr/sfu_news/archives/sfunews03040402.htm

36 http://www.mystery-pages.de/takethat/ttrev4.htm Hermie Harmsen “Microbes are everywhere and they will follow us into space” http://www.desc.med.vu.nl/NL-taxi/SAMPLE/SAM-page1.htm

37 Bill Simmons “Stop me before I get nostalgic” http://espn.go.com/page2/s/simmons/021113.html

38 http://www.angolotesti.it/testi/sum41/viewlyrics.asp?id=5 http://www.sing365.com/music/lyric.nsf/Since-The-Last-Time-I-Saw-You-lyrics-Yolanda-Adams/5C6351D7D817994948256C6B000ECF0E

39 http://curious.astro.cornell.edu/question.php?number=5 40 http://www.adequacy.net/reviews/t/texlahoma.shtml 41 Freeman, Mary Eleanor Wilkins: The Copy-Cat

http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=surround&offset=108933057&tag=+Freeman,+Mary+Eleanor+Wilkins,+1852-1930+:+The+Copy-Cat,+&+Other+Stories+/+Mary+E.+Wilkins+Freeman++1910+&query=Lucy+took+refuge+in+her+little+harbor+of+ignorance&id=FreCopy

42 Jerusalem Quarterly File. Issue 5, 1999. http://www.jqf-jerusalem.org/1999/jqf5/sarsar.html

43 Twain, Mark : Innocents Abroad http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=272460371&textreg=1&query=The+island+in+sight+was+Flores&id=TwaInno

44 Delany, Martin : Blake; or the huts of America, Part I. http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=78100673&textreg=1&query=Fastened+by+the+unyielding+links+of+the+iron+cable+of+despotism&id=DelBlak

45 Twain, Mark : Innocents Abroad http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=surround&offset=272815237&tag=+Twain,+Mark,+1835-1910+:+Innocents+Abroad++&query=At+Pisa+we+climbed+up+to+the+top+of+the+s

98

Page 104: Knock Knock Jokes

trangest+structure+the+world+has+any+knowledge&id=TwaInno Pete Harrison “Diving’s for losers – who says otherwise?” http://www.divernet.com/safety/talkph0100.htm

46 Verne, Jules : Off on a Comet http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=279365028&textreg=1&query=Servadac+listened+attentively&id=VerOffo SVSY Books http://www.springnet.cc/~cinemakatie/price10.html Alcott, Louisa May : Little Women http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=2132056&textreg=1&query=Like+bees+swarming+after+their+queen&id=AlcLitt

47 Twain, Mark : A Connecticut Yankee in King Arthur's Court http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=276551787&textreg=1&query=seeing+me+pacific+and+unresentful,+no+doubt+judged+that&id=TwaYank

48 FanFics: Harry Potter and The Queen of Ice – Jenn http://www.fronskiefeint.com/queenoficefic4.html Alcott, Louisa May : Little Women http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=1842138&textreg=1&query=Gondola+after+gondola+swept+up&id=AlcLitt

49 Doyle, Arthur Conan : The White Company http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=92581864&textreg=1&query=The+highway+had+lain+through+the+swelling+vineyard&id=DoyWhit Austen, Jane : Pride and Prejudice http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=11400964&textreg=1&query=Elizabeth,+as+they+drove+along,+watched+for+the+first+appearance&id=AusPrid

50 Potter, Beatrix : The Tale of Mr. Jeremy Fisher http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=211976476&textreg=1&query=boat+was+round&id=PotFish

51 http://music.hyperreal.org/artists/brian_eno/HCTWJlyrics.html Dreiser, Theodore : Sister Carrie http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=94098182&textreg=1&query=Drouet+took+on+a+slightly+more+serious+tone&id=DreSist

52 Trollope, Anthony : Can You Forgive Her? http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=266515793&textreg=1&query=said+the+old+man+to+her+when&id=TroForg http://judy.jteers.net/koalition/change.html Dickens, Charles : Oliver Twist http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=82641735&textreg=1&query=Another+morning.+The+sun+shone+brightly:+as+brightly+as+if+it+looked&id=DicOliv

53 http://www.bananacafe.ca/0312/fr-life-1c-0312.htm 54 Austen, Jane : Mansfield Park http://etext.virginia.edu/etcbin/ot2www-

ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=9716482&textreg=1&query=Sir+Thomas,+meanwhile,+went+on+with+his+own+hopes&id=AusMans Trollope, Anthony : Can You Forgive Her? http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=266515793&textreg=1&query=said+the+old+man+to+her+when&id=TroForg

99

Page 105: Knock Knock Jokes

55 http://goanna.info/tall_women's_clothing.html 56 Doyle, Arthur Conan : The Captain of the Polestar and other Tales

http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=86297933&textreg=1&query=It+was+nine+o'clock+on+a+Wednesday+morning&id=DoyCapt www.blackmask.com/books80c/silfox.htm Shakespeare, William : The First Part of King Henry IV http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=185061198&textreg=1&query=Nay,+then+I+cannot+blame+his+cousin+king&id=Mob1He4

57 www.is1.org/dreaming.html 58 Davis, Richard Harding. : The Red Cross Girl

http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=75849285&textreg=1&query=Latimer+went+on+his+way+without+asking+any+sympathy&id=DavRedC

59 Sahar Huneidi “The Holiday Season: Ready Or Not, Here It Comes!” http://www.psychicsahar.com/artman/publish/article_102.shtml Burnett, Frances Hodgson : The Lost Prince http://etext.virginia.edu/etcbin/ot2www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w&act=text&offset=31608129&textreg=1&query=A+note+of+terror+broke+into+his+voice&id=BurLoPr

60 www.anandamarga.or.id/contents.asp?cntn=360 61 www.sibs.org.uk/sians_story.html 62 Jane Austen Northanger Abbey Chapter 11,

http://etext.library.adelaide.edu.au/a/a93n/chap11.html 63 http://www.hendersonville-pd.org/nurseryround.html 64 Martin Overby “What's on the Tube?”

http://www.worldinvisible.com/newsltr/yr1998/05-07/9805over.htm 65 http://gibby16.diaryland.com/030614_47.html 66 Willa Sibert Cather “The Professor's Commencement”

http://etext.lib.virginia.edu/etcbin/browse-mixed-new?id=CatComm&tag=public&images=images/modeng&data=/texts/english/modeng/parsed www.unl.edu/Cather/works/short/professor.htm

100

Page 106: Knock Knock Jokes

Appendix B: Jokes Used in the Training Set 1. Knock, Knock 10. Knock, Knock

Who is there? Who is there? Justin Holden Justin who Holden who? Justin time for dinner. Holden I'll go see.

2. Knock, Knock 11. Knock, Knock Who is there? Who is there? Jess Panther Jess who? Panther who? Jess me, open the door. Panther no pants, I am going swimming.

3. Knock, Knock 12. Knock, Knock Who is there? Who is there? Howie Butcher Howie who? Butcher who? Howie going to figure this out? Butcher arms around me.

4. Knock, Knock 13. Knock, Knock Who is there? Who is there? Ima Jamaica Ima who? Jamaica who? Ima coming in, so open up. Jamaica good grade on your test?

5. Knock, Knock 14. Knock, Knock

Who is there? Who is there? Juana Olive Juana who? Olive who? Juana come out and play? Olive right next door to you.

6. Knock, Knock 15. Knock, Knock Who is there? Who is there? Thumping Olive Thumping who? Olive who? Thumping slimy is on your leg. Olive you darling.

7. Knock, Knock 16. Knock, Knock Who is there? Who is there? Disease Irish Disease who? Irish who? Disease pants fit you? Irish you would let me in.

8. Knock, Knock 17. Knock, Knock Who is there? Who is there? Oliver Atunna Oliver who? Atunna who? Oliver you there are bugs. Atunna trouble if you don't let me in.

9. Knock, Knock 18. Knock, Knock Who is there? Who is there? Bugspray Wayne Bugspray who? Wayne who? Bugs pray that snakes won't eat them Wayne, wayne, go away, come again another day.

101

Page 107: Knock Knock Jokes

28. Knock, Knock 19. Knock, Knock Who is there? Who is there? Italian Canary Italian who? Canary who? Italian you for the last time: open up. Canary come out and play?

29. Knock, Knock 20. Knock, Knock Who is there? Who is there? Avenue Sadie Avenue who? Sadie who? Avenue heard this joke before? Sadie magic words, and I'll tell you.

30. Knock, Knock 21. Knock, Knock Who is there? Who is there? Dwayne Alltell Dwayne who? Alltell who? Dwayne the bathtub, I am dwowning. Alltell mom if you don't let me in.

31. Knock, Knock 22. Knock, Knock Who is there? Who is there? Stan Butter Stan who? Butter who? Stan back, I think I am going to sneeze. Butter up! I am throwing a fast ball.

32. Knock, Knock 23. Knock, Knock

Who is there? Who is there? Freeze Arthur Freeze who? Arthur who? Freeze a jolly good fellow. Arthur more jokes here?

33. Knock, Knock 24. Knock, Knock Who is there? Who is there? Lettuce Candy Lettuce who? Candy who? Lettuce in, we are freezing. Candy come out and play?

34. Knock, Knock 25. Knock, Knock Who is there? Who is there? Doris Butter Doris who? Butter who? Doris locked, that's why I am knocking. Butter not tell you.

35. Knock, Knock 26. Knock, Knock

Who is there? Who is there? Harry Keith Harry who? Keith who? Harry up and answer the door. Keith me sweetheart.

36. Knock, Knock 27. Knock, Knock Who is there? Who is there? Beef Samoa Beef who? Samoa who? Beef-ore I get mad you better let me in. Samoa of these bad jokes and I am gone.

102

Page 108: Knock Knock Jokes

46. Knock, Knock 37. Knock, Knock Who is there? Who is there? Alison Max Alison who? Max who? Alison to you after you listen to me. Max no difference to you, just let me in.

47. Knock, Knock 38. Knock, Knock

Who is there? Who is there? Dismay Celeste Dismay who? Celeste who? Dismay not be a funny joke. Celeste time I am telling you open up.

48. Knock, Knock 39. Knock, Knock Who is there? Who is there? Waiter Ken Waiter who? Ken who? Waiter dress is unbutton. Ken you tell me some good jokes?

49. Knock, Knock 40. Knock, Knock Who is there? Who is there? Amanda Heaven Amanda who? Heaven who? Amanda fix the refrigerator is here. Heaven you heard enough Knock Knock jokes?

50. Knock, Knock 41. Knock, Knock Who is there? Who is there? Noah Police Noah who? Police who? Noah good place to find more jokes? Police tell me some Knock Knock jokes.

51. Knock, Knock 42. Knock, Knock Who is there? Who is there? Althea Luke Althea who? Luke who? Althea later, Alligator. Luke out! Here's another one!

52. Knock, Knock 43. Knock, Knock

Who is there? Who is there? Winnie Dawn Winnie who? Dawn who? Winnie is good, he is very, very good. Dawn by the station, early in the morning.

53. Knock, Knock 44. Knock, Knock Who is there? Who is there? Cargo Wade Cargo who? Wade who? Cargo beep beep, varoom. Wade down upon the Swanee River.

54. Knock, Knock 45. Knock, Knock Who is there? Who is there? Wendy Isabel Wendy who? Isabel who? Wendy last time you took a bath? Isabel out of order? I had to knock.

103

Page 109: Knock Knock Jokes

64. Knock, Knock 55. Knock, Knock Who is there? Who is there? Les Anna Les who? Anna who? Les hear another jokes Anna body knows some more jokes?

65. Knock, Knock 56. Knock, Knock Who is there? Who is there? Alaska Statue Alaska who? Statue who? Alaska one more time, then jokes start over. Statue that laughed a minute ago?

66. Knock, Knock 57. Knock, Knock Who is there? Who is there? Europe Mary Lee Europe who? Mary Lee who? Europe early this morning, aren't you? Mary Lee, Mary Lee, life is but a dream. Row…

58. Knock, Knock

Who is there? Ammonia Ammonia who? Ammonia trying to be funny.

59. Knock, Knock

Who is there? Radio Radio who? Radio not, here I come.

60. Knock, Knock Who is there? Amos Amos who? Amos-quito bit me.

61. Knock, Knock Who is there? Andy Andy who? Andy bit me again.

62. Knock, Knock

Who is there? Vera Vera who? Vera few people think these jokes are funny.

63. Knock, Knock Who is there? Shelby Shelby who? Shelby coming round the mountain when…

104

Page 110: Knock Knock Jokes

Appendix C: Jokes Used in the Test Set

The last line in each joke shows the expected punchline

7. Knock, Knock 1. Knock, Knock Who is there? Who is there? Luke Winnie Luke who? Winnie who? Luke out the window and see. Winnie finally shows up for work, tell him

he’s fired. Look out the window and see. When he finally shows up for work, tell him

he’s fired. 8. Knock, Knock

Who is there? Recycle Recycle who? 2. Knock, Knock Recycle around town on our bikes. Who is there? We cycle around town on our bikes. Freddy Freddy who?

Freddy or not, here I come! 9. Knock, knock Ready or not, here I come! Who is there?

Thelma Thelma who? 3. Knock, Knock Thelma I went out for pizza Who is there? Tell ma I went out for pizza Hugh’s Hugh’s who?

Hugh’s cars aren’t brand new 10. Knock, Knock Used cars aren’t brand new Who is there?

Kenya Kenya who? 4. Knock, Knock Kenya give me a hand? Who is there? Can you give me a hand? Texas Texas who?

Texas are high in this country. 11. Knock, Knock Taxes are high in this country. Who is there?

Avis Avis who? 5. Knock, Knock Avis-itor from Mars! Who is there? A visitor from Mars! Hank Hank who?

You are welcome§§§. 12. Knock, Knock Who is there?

6. Knock, Knock Sari Who is there? Sari who?

Sari I was sarong! A.C. Sorry I was sarong! A.C. who?

A.C. come A.C. go Easy come easy go 13. Knock, Knock Who is there? Sikkim Sikkim who?

Sikkim and you’ll find him. Seek him and you’ll find him.

§§§ The joke is not expected to be recognized

105

Page 111: Knock Knock Jokes

22. Knock, Knock 14. Knock, Knock Who is there? Who is there? Wanda Way Olive Wanda Way who? Olive who? Wanda Way, and you’ll be lost. Olive me, why not take olive me. Wander away, and you’ll be lost. All of me, why not take olive me.

23. Knock, Knock 15. Knock, Knock Who is there? Who is there? Jewel Ammonia Jewel who? Ammonia who? Jewel know who when you open the door. Ammonia bird in a gilded cage. You will know who when you open the door. I’m only a bird in a gilded cage.

24. Knock, Knock 16. Knock, Knock Who is there? Who is there? Fido Samoa Fido who? Samoa who? Fido known you were coming, I’d’ve baked

a cake. Samoa coffee, please! Some more coffee, please!

If I’d known you were coming, I’d’ve baked a cake.

17. Knock, Knock

Who is there? 25. Knock, Knock Uganda

Who is there? Uganda who? Uganda come in without knocking Irish You can’t a come in without knocking Irish who?

Irish you a merry Christmas. I wish you a merry Christmas. 18. Knock, Knock Who is there?

26. Knock, Knock Chuck Who is there? Chuck who?

Chuck-ago, Chuckago, that wonderful town.

Atlas Atlas who? Atlas the sun’s come out. I’ll just stay out

here. Chicago, Chicago, that wonderful town.

At last the sun’s come out. I’ll just stay out here.

19. Knock, Knock Who is there?

Jose 27. Knock, Knock Jose who?

Jose, can you see? Who is there? Oh, say, can you see? Eileen

Eileen who? Eileen too hard on this door and it’ll break –

better open up! 20. Knock, Knock

Who is there? I lean too hard on this door and it’ll break –

better open up! Amazon Amazon who? Amazon of a gun. I’m a son of a gun. 28. Knock, Knock

Who is there? Marjorie 21. Knock, Knock Marjorie who? Who is there? Marjorie found me guilty and now I’m in

jail. Ptolemy Ptolemy who?

My jury found me guilty and now I’m in jail.

Ptolemy that you love me. Tell me that you love me.

106

Page 112: Knock Knock Jokes

36. Knock, Knock 29. Knock, Knock

Who is there? Who is there? Amish Theodore Amish who? Theodore who? That’s funny. You don’t look like a shoe.**** Theodore is locked and I can’t get in.

The door is locked and I can’t get in. 37. Knock, Knock

30. Knock, Knock Who is there? Who is there? Sweden Your maid Sweden who?

Sweden the lemonade, it’s bitter. Your maid who? Your maid your bed, now lie in it. Sweeten the lemonade, it’s bitter. You made your bed, now lie in it. 38. Knock, Knock

31. Knock, Knock Who is there? Who is there? Decanter Euell Decanter who?

Decanter at my temple is almost eighty years old.

Euell who? Euell miss out on a big opportunity if you

don’t open the door soon. The cantor at my temple is almost eighty years old. You’ll miss out on a big opportunity if you

don’t open the door soon. 39. Knock, Knock

32. Knock, Knock Who is there? Who is there? Dishes Al B. Dishes who?

Dishes the end of the world. Good bye to all!

Al B. who? Al B. back. I’ll be back. This is the end of the world. Good bye to

all! 33. Knock, Knock

Who is there? 40. Knock, Knock Tarzan Who is there? Tarzan who? Amanda Tarzan stripes forever. Amanda who? Stars and stripes forever. Amanda fix your TV set!

A man to fix your TV set! 34. Knock, Knock

Who is there? 41. Knock, Knock Irish Who is there? Irish who? C.D. Irish I could carry a tune. C.D. who? I wish I could carry a tune. C.D. badge I’m holding? This is police.

Open up! See the badge I’m holing? This is police.

Open up! 35. Knock, Knock

Who is there? Gnu Gnu who? 42. Knock, Knock Gnu Zealand is a cool place to visit. Who is there? New Zealand is a cool place to visit. Aussie Aussie who?

Aussie you later, mate. I’ll see you later, mate.

**** The joke is not expected to be recognized

107

Page 113: Knock Knock Jokes

50. Knock, Knock 43. Knock, Knock

Who is there? Who is there? Whale Candice Whale who? Candice who? Whale meet you in the bar around the

corner. Candice be true love at long last? Can this be true love at long last?

We’ll meet you in the bar around the corner. 44. Knock, Knock

51. Knock, Knock Who is there? Who is there? Hour Cairo Hour who?

Hour you today? I’m pretty good myself. Cairo who? Cairo the boat for awhile? How are you today? I’m pretty good

myself. Can I row the boat for awhile?

52. Knock, Knock 45. Knock, Knock Who is there? Who is there? Hugo Marie Hugo who? Marie who? Hugo and see for yourself. Marie Christmas to all! You go and see for yourself. Merry Christmas to all!

53. Knock, Knock 46. Knock, Knock Who is there? Who is there? Wooden Water Wooden who? Water who? Wooden it be nice to have Mondays off? Water our chances of winning the lottery? Wouldn’t it be nice to have Mondays off? What are our chances of winning the

lottery? 54. Knock, Knock

Who is there? 47. Knock, Knock Abby Who is there? Abby who? Hatch Abby Birthday to you! Hatch who?

Bless you.†††† Happy Birthday to you!

55. Knock, Knock 48. Knock, Knock Who is there? Who is there? Thesis Army Thesis who? Army who? Thesis a stickup! Army and my friends invited to your

Halloween party This is a stickup! Are me and my friends invited to your

Halloween party

56. Knock, Knock Who is there? Ale 49. Knock, Knock Ale who? Who is there? Ale! Ale! The gang’s all here. Demure Hale! Hale! The gang’s all here. Demure who?

Demure I get, demure I want The more I get, the more I want

†††† The joke is not expected to be recognized

108

Page 114: Knock Knock Jokes

64. Knock, Knock 57. Knock, Knock

Who is there? Who is there? Dee Ox Dee who? Ox who? Dee-livery. Open-up – your pizza’s getting

cold. Ox me for a date and I may say yes. Ask me for a date and I may say yes.

Delivery. Open-up – your pizza’s getting cold.

58. Knock, Knock

Who is there? 65. Knock, Knock Mary

Who is there? Mary who? Mary me, my darling. Zeus Marry me, my darling. Zeus who?

Zeus house is this anyway? Whose house is this anyway? 59. Knock, Knock

Who is there? 66. Knock, Knock Frosting

Who is there? Frosting who? Frosting in the morning brush your teeth. Ivan First thing in the morning brush your teeth. Ivan who?

Ivan to come in. It’s cold out here. I want to come in. It’s cold out here. 60. Knock, Knock Who is there?

67. Knock, Knock Oil Who is there? Oil who?

Oil change, just give me a chance Asia I’ll change, just give me a chance. Asia who?

Asia father home? He owes me money. Is your father home? He owes me money. 61. Knock, Knock Who is there?

68. Knock, Knock Gnats Who is there? Gnats who?

Gnats not funny! Open up! Selma That’s not funny! Open up! Selma who?

Selma shares in the company. The stock is going down.

62. Knock, Knock

Sell me shares in the company. The stock is going down.

Who is there? Vericose

Vericose who? Vericose knit family. We stick together. 69. Knock, Knock We’re a close knit family. We stick

together. Who is there? Harriet Harrier who? Harriet too much. There’s nothing left for

me. 63. Knock, Knock

Who is there? Harry ate too much. There’s nothing left

for me. Police Police who? Police open the door. I’m tired of knocking. Please open the door. I’m tired of knocking. 70. Knock, Knock

Who is there? Dewey Dewey who? Dewey have to go to the dentist? Do we have to go to the dentist?

109

Page 115: Knock Knock Jokes

78. Knock, Knock 71. Knock, Knock

Who is there? Who is there? Watson Stan Watson who? Stan who? Nothing much. Watson with you? Stan up straight and stop slouching! Nothing much. What’s who with you? Stand up straight and stop slouching!

79. Knock, Knock 72. Knock, Knock Who is there? Who is there? Thermos Irving Thermos who? Irving who? Thermos be someone home, I see a light on. Irving a good time on vacation. Wish you

were here. There must be someone home, I see a light on. Having a good time on vacation. Wish you

were here. 80. Knock, Knock

Who is there? 73. Knock, Knock Sahara Who is there? Sahara who? Maryanne Sahara you dune? Maryanne who? What in the hell are you doing? Maryanne and live happily ever after.

Marry Anne and live happily ever after. 81. Knock, Knock

Who is there? 74. Knock, Knock Shannon Who is there? Shannon who? Mandy Shannon, Shannon harvest moon, up in the

sky… Mandy who? Mandy lifeboats. The ship is sinking!

Shine on, Shine on harvest moon, up in the sky…

Man the lifeboats. The ship is sinking!

75. Knock, Knock 82. Knock, Knock Who is there?

Who is there? Wayne Iowa Wayne who?

Wayne are we gonna eat? I’m starving. Iowa who? Iowa lot to my brother. When are we gonna eat? I’m starving. I owe a lot to my brother. 76. Knock, Knock

83. Knock, Knock Who is there? Who is there? Wiley Hugh Wiley who?

Wiley was sleeping my wife packed my things and moved me out of the apartment.

Hugh who? Hi, there↨↨↨↨

While I was sleeping my wife packed my things and moved me out of the apartment.

84. Knock, Knock

Who is there? Wallabee 77. Knock, Knock Wallabee who? Who is there? Wallabee sting if you sit on it? Boo Will a bee string if you sit on it? Boo who?

Don’t cry, sweetie pie.‡‡‡‡

‡‡‡‡ The joke is not expected to be recognized

110

Page 116: Knock Knock Jokes

92. Knock, Knock 85. Knock, Knock

Who is there? Who is there? Surreal Catch Surreal who? Catch who?

Gesundheit! §§§§ Surreal pleasure to be here. It’s a real pleasure to be here. 86. Knock, Knock

93. Knock, Knock Who is there? Who is there? Sheby Harmony Shelby who?

Shelby coming round the mountain when she comes…

Harmony who? Harmony times do I have to knock before

you let me in? She’ll be coming round the mountain when she comes... How many times do I have to knock before

you let me in? 87. Knock, Knock

94. Knock, Knock Who is there? Who is there? Yah Detail Yah who?

Gosh, I am glad to see you too§§§§ Detail who? Detail-aphone man! The telephone man! 88. Knock, Knock Who is there?

95. Knock, Knock Elsie Who is there? Elsie who?

Elsie you later! Pencil I’ll see you later! Pencil who?

Pencil fall down without suspenders. Pants will fall down without suspenders. 89. Knock, Knock Who is there?

96. Knock, Knock Annie Who is there? Annie who?

Annie-body home? Wayne Anybody home? Wayne who?

Wayne drops keep falling on my head... Rain drops keep falling on my head... 90. Knock, Knock Who is there?

97. Knock, Knock Dots Who is there? Dots who?

Dots for me you know, and for you to find out!

Avoid Avoid who? Avoid to the vise is sufficient. That’s for me you know, and for you to find

out! A word to the wise is sufficient.

98. Knock, Knock 91. Knock, Knock Who is there? Who is there? Avenue Demons Avenue who? Demons who? Avenue met me somewhere before? Demons are a ghouls’ best friend. Haven’t you met me somewhere before? Diamonds are a girl’s best friend. §§§§ The joke is not expected to be recognized

111

Page 117: Knock Knock Jokes

107. Knock, Knock 99. Knock, Knock

Who is there? Who is there? Tamara Cargo Tamara who? Cargo who? Tamara I have an important meeting. Cargo honk, honk. Tomorrow I have an important meeting. Car go honk, honk.

108. Knock, Knock 100. Knock, Knock Who is there? Who is there? Oscar Tarzan Oscar who? Tarzan who? Oscar for a date and maybe she’ll go out

with you! Tarzan tripes forever! Stars and stripes forever!

Ask her for a date and maybe she’ll go out with you!

101. Knock, Knock

Who is there? 109. Knock, Knock Turnip

Who is there? Turnip who? Turnip the stereo. I love this song. Otto Turn up the stereo. I love this song. Otto who?

Otto theft is a serious crime. Auto theft is a serious crime. 102. Knock, Knock

Who is there? 110. Knock, Knock Canada

Who is there? Canada who? Canada boys comma over to play poker? I’m Helen Can the boys comma over to play poker? I’m Helen who?

I’m Helen wheels. VAROOM! I’m hell on wheels. VAROOM! 103. Knock, Knock

Who is there? 111. Knock, Knock Recent

Who is there? Recent who? Recent you a bill the first of the month Vision We sent you a bill the first of the month Vision who?

Vision you a happy New Year! Wishing you a happy New Year! 104. Knock, Knock

Who is there? 112. Knock, Knock Heavenly

Who is there? Heavenly who? Heavenly met somewhere before? Rabbit Haven’t we met somewhere before? Rabbit who?

Rabbit up nice. It’s a Christmas gift. Wrap it up nice. It’s a Christmas gift. 105. Knock, Knock Who is there?

113. Knock, Knock Omar Who is there? Omar who?

Omar darling Clementine I, Felix Oh my darling Clementine I, Felix who?

I, Felix-cited. I feel excited. 106. Knock, Knock Who is there? Barn Barn who?

Barn to be wild! Born to be wild!

112

Page 118: Knock Knock Jokes

121. Knock, Knock 114. Knock, Knock

Who is there? Who is there? Midas Urn Midas who? Urn who? Midas well sit down and relax. Urn your keep by finding a job. Might as well sit down and relax. Earn your keep by finding a job.

122. Knock, Knock 115. Knock, Knock Who is there? Who is there? Gopher Venice Gopher who? Venice who? Gopher a swim. It will refresh you. Venice pay day. I’m broke. Go for a swim. It will refresh you. When is pay day. I’m broke.

123. Knock, Knock 116. Knock, Knock Who is there? Who is there? Rice Market Rice who? Market who? Rice and shine, Sleepyhead! Market paid in full. Rise and shine, Sleepyhead! Mark it paid in full.

124. Knock, Knock 117. Knock, Knock Who is there? Who is there? Jonas Laurie Jonas who? Laurie who? Jonas for a cocktail after work. Laurie, Laurie hallelujah. Join us for a cocktail after work. Glory, glory hallelujah.

125. Knock, Knock 118. Knock, Knock Who is there? Who is there? Turnip Butcher Turnip who? Butcher who? Turnip the TV. I can’t hear the news. Butcher arms around me and give me a big

hug. Turn up the TV. I can’t hear the news. Put your arms around me and give me a big

hug.

126. Knock, Knock Who is there? Ken 119. Knock, Knock Ken who? Who is there? Ken you open the door already? Ferris Can you open the door already? Ferris who?

Ferris fair, so don’t cheat. Fair is fair so don’t cheat. 127. Knock, Knock

Who is there? Myron 120. Knock, Knock Myron who? Who is there? Myron around the park made me tired. Ammonia My run around the park made me tired. Ammonia who?

Ammonia lost person looking for directions. I am only a lost person looking for

directions 128. Knock, Knock

Who is there? Rita Rita who? Rita good book lately? Read a good book lately?

113

Page 119: Knock Knock Jokes

129. Knock, Knock

Who is there? Ida Ida who? Ida written sooner, but I lost your address. I’d have written sooner, but I lost your

address.

130. Knock, Knock Who is there? Menu Menu who? Menu wish upon a star, good things happen. When you wish upon a star, good things

happen.

114

Page 120: Knock Knock Jokes

Appendix D: KK Recognizer Algorithm Description

IsItJoke 1 if ValidateJokeStructure returns that the joke has correct structure 1.1 if the keyword used in Line3 is known to the program, i.e. the

program has known wordplay for Line3 1.1.1 for each known wordplay based one Line3 1.1.1.1 if ordplay is at least two words long w1.1.1.1.1 if pairs of words in wordplay exists in bigram table 1.1.1.1.1.1 if ValidatePunchline (wordplay) 1.1.1.1.1.2 return “JOKE FOUND” 1.1.1.2 else 1.1.1.2.1 if ValidatePunchline (wordplay) 1.1.1.2.1.1 return “JOKE FOUND” 1.2 call GenerateWordplay(Keyword) 1.3 for each generated wordplay 1.3.1 if wordplay is at least two words long 1.3.1.1 if pairs of words in wordplay exists in bigram table 1.3.1.1.1 if ValidatePunchline (wordplay) 1.3.1.1.1.1 return “JOKE FOUND” 1.3.2 else 1.3.2.1 if ValidatePunchline (wordplay) 1.3.2.1.1 return “JOKE FOUND” 2 return “JOKE NOT FOUND” ValidateJokeStructure 1 if Line OR Line2 are not valid 1 1.1 return false 2 read Line3 3 set Keyword to Line3 without spaces or punctuation marks 4 if Line is not valid OR Line5 does not contain Line3 4

4.1 return false 5 return true GenerateWordplay input TopPhrase: an element with a structure containing a string and a

similarity value 1 for each Letter in the string of TopPhrase 1.1 if Letter is not replaced from original keyword 1.1.1 for each line in the similarity table 1.1.1.1 if Letter is the same as entry in first column of similarity

table 1.1.1.1.1 copy TopPhrase into NewPhrase 1.1.1.1.2 replace Letter from string of NewPhrase with entry from

second column of similarity table 1.1.1.1.3 similarity of NewPhrase = similarity of TopPhrase – 1 + entry

from third column of similarity table 1.1.1.1.4 insert NewPhrase into heap

115

Page 121: Knock Knock Jokes

ValidatePunchline input: phrase – a sequence of words 1 set firstWord to first word in phrase 2 set secondWord to second word in phrase 3 set lastWord to last word in phrase 4 set sndToLast to second to last word in phrase 5 set firstAfter to first word in punchline after phrase 6 set secondAfter to second word in punchline after phrase 7 set firstBefore to first word in punchline before phrase (if exists) 8 set secondBefore to second word in punchline before phrase (if

exists) //phrase in the beginning or middle of punchline 9 if (sndToLast, lastWord, firstAfter) is not the database and sndToLast exists 9.1 return false 10 if (lastWord, firstAfter, secondAfter) is not the database 10.1 return false //phrase in the end or middle of punchline 11 if (firstBefore, firstWord, secondWord) is not the database and

dWord exists secon11 return false .1 12 if (secondBefore, firstBefore, firstWord) is not the database 12.1 return false 13 return true

116

Page 122: Knock Knock Jokes

Appendix E: A table of “Similarity of English consonant pairs using the natural classes model,” developed by Stefan Frisch

117

Page 123: Knock Knock Jokes

Appendix F: Cost Table developed by Christian Hemplemann

118