the trouble with house elves experiments in computational folkloristics experiments in computational...

43
The Trouble with House The Trouble with House Elves Elves Experiments in Computational Experiments in Computational Folkloristics Folkloristics Timothy R. Tangherlini

Upload: roland-willis

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

3 A story… It was the old counselor from Skårupgård who came riding with four headless horses to Todbjærg church. He always drove out of the northern gate, and there by the gate was a stall, they could never keep that stall door closed. They had a farmhand who closed it once after it had sprung open. But one night, after he'd gone to bed, something came after the farmhand and it lifted his bed straight up to the rafters and crushed him quite hard. Then the farmhand shouted and asked them to stop lifting him up there. "No, you've tormented us, but now you'll die..." I heard that's how two farmhands were crushed to death. He wanted to close the door and then they never tried to close it again.

TRANSCRIPT

Page 1: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

The Trouble with House The Trouble with House ElvesElves

Experiments in Computational FolkloristicsExperiments in Computational Folkloristics

Timothy R. Tangherlini

Page 2: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

2

Page 3: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

3

A story…It was the old counselor from Skårupgård who came riding with four headless horses to Todbjærg church. He always drove out of the northern gate, and there by the gate was a stall, they could never keep that stall door closed. They had a farmhand who closed it once after it had sprung open. But one night, after he'd gone to bed, something came after the farmhand and it lifted his bed straight up to the rafters and crushed him quite hard. Then the farmhand shouted and asked them to stop lifting him up there. "No, you've tormented us, but now you'll die..." I heard that's how two farmhands were crushed to death. He wanted to close the door and then they never tried to close it again.

Page 4: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

4

• Some standard questions:– Role of ghosts in late 19th century Denmark?– Origins of the story?– Structure of the story?– Who, what, where of this story?

• Is there a need for a computational folkloristics?

Page 5: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

5

Folklore

• Early history of the discipline:– Philological– National Romanticism

• Johann Gottfied Herder (1744-1803)– Wilhelm (1786-1859) and Jacob (1785-1863) Grimm– Search for original forms

Page 6: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

6

Romantic Nationalism in the Nordic lands

• Asbjørnsen and Moe (Norway) – Development of the Norwegian language

• Linnaeus and the rush to categorize (Sweden)• The ballad, archaeology and Svend Grundtvig

(Denmark)• The Kalevala and folklore as a science

(Finland)

Page 7: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

7

Mapping Folklore

• Historic-geographic method– Kaarle and Julius Krohn (1906-1924)

• Focused work on the Finnish epic, Kalevala• Led to the type index of folk literature (Antti Aarne)

– Ripples on a pond theory of folklore diffusion 

Page 8: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

8

Maps in the study of culture

• Geography is not an inert container, is not a box where cultural history "happens," but an active force, that pervades the literary field and shapes its depth. Making the connection between geography and literature explicit... will allow us to see some significant relationships that have so far escaped us

– (Moretti 1998, 3).

Page 9: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

9

A New Historic-Geographic Method

• Folklore as a process:– in time and space – emerges from the dialectic between

individuals and tradition• Maps can help model relationships

between:– People – Environment – Folk Repertoires

Page 10: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

10

Study Corpus Study Corpus

• 6500+ named informants • 24,000 manuscript pages• 250,000 published expressions

• Evald Tang Kristensen (1843-1929)– Actively collected from 1865-1923– 219 collecting trips

Page 11: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

11

A multi-level folklore browser

• People• Places• Stories

Page 12: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

12

Experiments in mapping1. Mapping collecting routes

• Challenge Question: Did Tang Kristensen’s published statements about his collecting accurately reflect his collecting work?

2. Mapping individual repertoire distribution• CQ: Does individual mobility influence the range of places

mentioned in stories?• CQ: Do other informant features, such as gender, influence

range of places mentioned?3. Mapping by story features against individual repertoire

• CQ: Are there patterns, ala Moretti, that become apparent in the visualization of stories by repertoire, genre and/or story topic?

Page 13: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

13

Experiment 1:Mapping Collecting Routes

– ETK presents himself as a West Jutlander– Political motivations

• Aftermath of Napoleonic wars and Danish bankruptcy (1814)

• Loss of Schleswig to Bismarck (1865)• Urbanization

– Search for “authentic” Danish culture– What do the collecting routes reveal?

Page 14: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

14

Experiments 2 & 3:Mapping Repertoire

• Theory: Individual biography influences repertoire and its features

• Hypothesis: Classes of individuals have different degrees of physical mobility, and this is reflected in their storytelling

• Hope: Maps reveal interesting patterns of places-mentioned– A Caveat: My main interest, and the vast majority of the

collection, are based on legends, stories that refract the lived environments and social organization of the tradition participants

Page 15: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

15

Experiment 2:Place Name Distribution and Mobility

– Target: repertoires of 5 storytellers– Limit: only stories that mention places– Method

• Plot place names mentioned by storyteller• Calculate Standard Deviation Ellipse distribution

patterns for places mentioned in storyteller repertoires

• Look for patterns in the underlying place name distribution

Page 16: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

16

Experiment 3:Can unsupervised learning on text help in pattern discovery?

Page 17: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

17

Experiment 3: Unsupervised learning and Repertoire clusters

– Target: repertoires of 5 storytellers– Limit: only stories that mention places– Method

• Convert stories to TFIDF vector representations• Force dimensionality reduction using SVD• Cluster: ECM by storyteller

– eliminate small clusters

• Project results into GIS• Calculate distribution ellipses for each cluster in

each person’s repertoire

Page 18: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

18

A Crisis…

• Maps were informative since new patterns in the geographic distribution of stories were discovered…– why hadn’t I known about these patterns before?

• What other types of patterns, some very small, some very large are lurking in the data?

• How can I be sure that my selection of examples is representative or even accurate?

Page 19: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

19

A Classic Folklore Problem

• Classification in folklore– 1 text = 1 classifier

• What happens when the classifier was designed for a different research problem?

• Are we missing patterns that are not solely related to single topic classifiers?

• Are we missing stories in our searches because of these single topic classifiers?

• Does this limit our ability to work with a large archive?

Page 20: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

20

Current folklore classifiers are very expensive

Page 21: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

21

A lost story…It was the old counselor from Skårupgård who came riding with four headless horses to Todbjærg church. He always drove out of the northern gate, and there by the gate was a stall, they could never keep that stall door closed. They had a farmhand who closed it once after it had sprung open. But one night, after he'd gone to bed, something came after the farmhand and it lifted his bed straight up to the rafters and crushed him quite hard. Then the farmhand shouted and asked them to stop lifting him up there. "No, you've tormented us, but now you'll die..." I heard that's how two farmhands were crushed to death. He wanted to close the door and then they never tried to close it again.

Page 22: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

22

Networks to the rescue?• Folklore as traditional communication across

social networks– Folklore networks

• Social networks of tradition participants• Networks of scholars and collectors• Networks of stories

– External networks• Communications networks• Transportation networks• Affiliation networks

– Internal networks• Linguistic networks

Page 23: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

23

Connecting the dots…

S1 S2

P1

P2 I1 I2

S3 S4

P3

P4

I3

P5

P6

P8P7

I3

I3

Page 24: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

24

Storyteller networks• Local networks

• Connect all storytellers in a given parish• Connect all storytellers in a family

• Fieldtrip networks• Connect all storytellers on a given fieldtrip

• Collector-Storyteller networks• Connect all storytellers to all collectors with whom they

worked

• Inferred / Affiliation networks• Connect storytellers by work groups (eg millers, fiddlers, etc)• Connect storytellers by other affiliations (eg gender, age,

education)

Page 25: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

25

Story networks• Connect stories to:

– People:• storytellers • people mentioned

– Places • places collected • places mentioned

– Each other• By shared indexing• By shared keyword (keyword extraction)• By shared topic (topic modeling using LDA)• By shallow ontology (tango index)

Page 26: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

26

An initial graph of the ETK study corpus

Page 27: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

27

Lost in a thicket of stories, keywords, etc

Page 28: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

28

Folklore Spaghetti

Page 29: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

29

Graph clustering

• Use a tuned version of MCL clustering for graphs– iteratively generates stochastic matrices, also

known as Markov matrices (van Dongen 2000)

– 2973 nodes / 52663 edges

Page 30: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

30

Structure emerges and the graph becomes useful

Page 31: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

31

Remember our ghost story?

• DS IV 650• Classified as a story about manor lords,

not ghosts!• Impossible to find in the archive• Can I use networks to find this story?• Will it help me find other stories of

interest?

Page 32: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

32

Page 33: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

33

Page 34: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

34

Almost all the surrounding storiesare cataloged as ghost stories!

Page 35: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

35

DS II B 147 is a story of interest—not a ghost story but strongly connected to DS IV 650…

Page 36: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

36

DS II B 147

• A story about a house elf at a farm in Egå...• Ends as follows:

– When they got home, the farmhand was happy because now he’d gotten something to use for feed, and afterward nis could go and feed just as much as he wanted to. Then they got another farmhand, and he didn’t want to let him go on like that. But he got lifted up in his bed and all the way up to the rafters, so he lay there dead when people got up the next morning.

Page 37: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

37

The trouble with house elves…

• You can’t always find them…• They act in unpredictable ways…• The things they do turn out to be pretty

mean and nasty

Page 38: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

38

New research question

• What is the relationship between ghosts and house elves in 19th century Denmark and why might there be such a relationship?

Page 39: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

39

Some tentative conclusions

Page 40: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

40

Directions for future work• Labeling

– Can we automatically label nodes given a sparsely labeled graph? (LDA-G, Homophily algorithms)

• Anomaly detection / Community detection– Can we automatically find “stories of interest” on our

graph?• Multimodal networks

– Integrate network information from several networks• Dynamic networks

– Understanding how network changes over time• Geographic visualization of network models

Page 41: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

41

A Very Special Thanks to

• IPAM– Peter Jones– Mark Green– Russ Caflisch

• Colleagues and friends from Search Engines 2007

Page 42: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

42

Additional thanks to

• Peter Broadwell, UCLA• James Abello, DIMACS• Tina Eliassi-Rad, LLNL/Rutgers• Nischal Devanur, Rutgers• UCLA’s Center for Digital Humanities

Page 43: The Trouble with House Elves Experiments in Computational Folkloristics Experiments in Computational Folkloristics Timothy R. Tangherlini

43

Funded by…– Nordic Council of Ministers– The American Council of Learned Societies– NSF Eager Grant IIS- 0970179

• With Lancaster (ECAI), Buckland (ECAI), Eliassi-Rad (Rutgers) and Faloutsos (CMU)

– Google Books Humanities grant– Many ideas derived from

• NEH Institute for Advanced Topics in Digital Humanities, “Networks and Network Analysis for the Humanities” (NEH HT5001609)