the trouble with house elves experiments in computational folkloristics experiments in computational...

Post on 18-Jan-2018

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

3 A story… It was the old counselor from Skårupgård who came riding with four headless horses to Todbjærg church. He always drove out of the northern gate, and there by the gate was a stall, they could never keep that stall door closed. They had a farmhand who closed it once after it had sprung open. But one night, after he'd gone to bed, something came after the farmhand and it lifted his bed straight up to the rafters and crushed him quite hard. Then the farmhand shouted and asked them to stop lifting him up there. "No, you've tormented us, but now you'll die..." I heard that's how two farmhands were crushed to death. He wanted to close the door and then they never tried to close it again.

TRANSCRIPT

The Trouble with House The Trouble with House ElvesElves

Experiments in Computational FolkloristicsExperiments in Computational Folkloristics

Timothy R. Tangherlini

2

3

A story…It was the old counselor from Skårupgård who came riding with four headless horses to Todbjærg church. He always drove out of the northern gate, and there by the gate was a stall, they could never keep that stall door closed. They had a farmhand who closed it once after it had sprung open. But one night, after he'd gone to bed, something came after the farmhand and it lifted his bed straight up to the rafters and crushed him quite hard. Then the farmhand shouted and asked them to stop lifting him up there. "No, you've tormented us, but now you'll die..." I heard that's how two farmhands were crushed to death. He wanted to close the door and then they never tried to close it again.

4

• Some standard questions:– Role of ghosts in late 19th century Denmark?– Origins of the story?– Structure of the story?– Who, what, where of this story?

• Is there a need for a computational folkloristics?

5

Folklore

• Early history of the discipline:– Philological– National Romanticism

• Johann Gottfied Herder (1744-1803)– Wilhelm (1786-1859) and Jacob (1785-1863) Grimm– Search for original forms

6

Romantic Nationalism in the Nordic lands

• Asbjørnsen and Moe (Norway) – Development of the Norwegian language

• Linnaeus and the rush to categorize (Sweden)• The ballad, archaeology and Svend Grundtvig

(Denmark)• The Kalevala and folklore as a science

(Finland)

7

Mapping Folklore

• Historic-geographic method– Kaarle and Julius Krohn (1906-1924)

• Focused work on the Finnish epic, Kalevala• Led to the type index of folk literature (Antti Aarne)

– Ripples on a pond theory of folklore diffusion 

8

Maps in the study of culture

• Geography is not an inert container, is not a box where cultural history "happens," but an active force, that pervades the literary field and shapes its depth. Making the connection between geography and literature explicit... will allow us to see some significant relationships that have so far escaped us

– (Moretti 1998, 3).

9

A New Historic-Geographic Method

• Folklore as a process:– in time and space – emerges from the dialectic between

individuals and tradition• Maps can help model relationships

between:– People – Environment – Folk Repertoires

10

Study Corpus Study Corpus

• 6500+ named informants • 24,000 manuscript pages• 250,000 published expressions

• Evald Tang Kristensen (1843-1929)– Actively collected from 1865-1923– 219 collecting trips

11

A multi-level folklore browser

• People• Places• Stories

12

Experiments in mapping1. Mapping collecting routes

• Challenge Question: Did Tang Kristensen’s published statements about his collecting accurately reflect his collecting work?

2. Mapping individual repertoire distribution• CQ: Does individual mobility influence the range of places

mentioned in stories?• CQ: Do other informant features, such as gender, influence

range of places mentioned?3. Mapping by story features against individual repertoire

• CQ: Are there patterns, ala Moretti, that become apparent in the visualization of stories by repertoire, genre and/or story topic?

13

Experiment 1:Mapping Collecting Routes

– ETK presents himself as a West Jutlander– Political motivations

• Aftermath of Napoleonic wars and Danish bankruptcy (1814)

• Loss of Schleswig to Bismarck (1865)• Urbanization

– Search for “authentic” Danish culture– What do the collecting routes reveal?

14

Experiments 2 & 3:Mapping Repertoire

• Theory: Individual biography influences repertoire and its features

• Hypothesis: Classes of individuals have different degrees of physical mobility, and this is reflected in their storytelling

• Hope: Maps reveal interesting patterns of places-mentioned– A Caveat: My main interest, and the vast majority of the

collection, are based on legends, stories that refract the lived environments and social organization of the tradition participants

15

Experiment 2:Place Name Distribution and Mobility

– Target: repertoires of 5 storytellers– Limit: only stories that mention places– Method

• Plot place names mentioned by storyteller• Calculate Standard Deviation Ellipse distribution

patterns for places mentioned in storyteller repertoires

• Look for patterns in the underlying place name distribution

16

Experiment 3:Can unsupervised learning on text help in pattern discovery?

17

Experiment 3: Unsupervised learning and Repertoire clusters

– Target: repertoires of 5 storytellers– Limit: only stories that mention places– Method

• Convert stories to TFIDF vector representations• Force dimensionality reduction using SVD• Cluster: ECM by storyteller

– eliminate small clusters

• Project results into GIS• Calculate distribution ellipses for each cluster in

each person’s repertoire

18

A Crisis…

• Maps were informative since new patterns in the geographic distribution of stories were discovered…– why hadn’t I known about these patterns before?

• What other types of patterns, some very small, some very large are lurking in the data?

• How can I be sure that my selection of examples is representative or even accurate?

19

A Classic Folklore Problem

• Classification in folklore– 1 text = 1 classifier

• What happens when the classifier was designed for a different research problem?

• Are we missing patterns that are not solely related to single topic classifiers?

• Are we missing stories in our searches because of these single topic classifiers?

• Does this limit our ability to work with a large archive?

20

Current folklore classifiers are very expensive

21

A lost story…It was the old counselor from Skårupgård who came riding with four headless horses to Todbjærg church. He always drove out of the northern gate, and there by the gate was a stall, they could never keep that stall door closed. They had a farmhand who closed it once after it had sprung open. But one night, after he'd gone to bed, something came after the farmhand and it lifted his bed straight up to the rafters and crushed him quite hard. Then the farmhand shouted and asked them to stop lifting him up there. "No, you've tormented us, but now you'll die..." I heard that's how two farmhands were crushed to death. He wanted to close the door and then they never tried to close it again.

22

Networks to the rescue?• Folklore as traditional communication across

social networks– Folklore networks

• Social networks of tradition participants• Networks of scholars and collectors• Networks of stories

– External networks• Communications networks• Transportation networks• Affiliation networks

– Internal networks• Linguistic networks

23

Connecting the dots…

S1 S2

P1

P2 I1 I2

S3 S4

P3

P4

I3

P5

P6

P8P7

I3

I3

24

Storyteller networks• Local networks

• Connect all storytellers in a given parish• Connect all storytellers in a family

• Fieldtrip networks• Connect all storytellers on a given fieldtrip

• Collector-Storyteller networks• Connect all storytellers to all collectors with whom they

worked

• Inferred / Affiliation networks• Connect storytellers by work groups (eg millers, fiddlers, etc)• Connect storytellers by other affiliations (eg gender, age,

education)

25

Story networks• Connect stories to:

– People:• storytellers • people mentioned

– Places • places collected • places mentioned

– Each other• By shared indexing• By shared keyword (keyword extraction)• By shared topic (topic modeling using LDA)• By shallow ontology (tango index)

26

An initial graph of the ETK study corpus

27

Lost in a thicket of stories, keywords, etc

28

Folklore Spaghetti

29

Graph clustering

• Use a tuned version of MCL clustering for graphs– iteratively generates stochastic matrices, also

known as Markov matrices (van Dongen 2000)

– 2973 nodes / 52663 edges

30

Structure emerges and the graph becomes useful

31

Remember our ghost story?

• DS IV 650• Classified as a story about manor lords,

not ghosts!• Impossible to find in the archive• Can I use networks to find this story?• Will it help me find other stories of

interest?

32

33

34

Almost all the surrounding storiesare cataloged as ghost stories!

35

DS II B 147 is a story of interest—not a ghost story but strongly connected to DS IV 650…

36

DS II B 147

• A story about a house elf at a farm in Egå...• Ends as follows:

– When they got home, the farmhand was happy because now he’d gotten something to use for feed, and afterward nis could go and feed just as much as he wanted to. Then they got another farmhand, and he didn’t want to let him go on like that. But he got lifted up in his bed and all the way up to the rafters, so he lay there dead when people got up the next morning.

37

The trouble with house elves…

• You can’t always find them…• They act in unpredictable ways…• The things they do turn out to be pretty

mean and nasty

38

New research question

• What is the relationship between ghosts and house elves in 19th century Denmark and why might there be such a relationship?

39

Some tentative conclusions

40

Directions for future work• Labeling

– Can we automatically label nodes given a sparsely labeled graph? (LDA-G, Homophily algorithms)

• Anomaly detection / Community detection– Can we automatically find “stories of interest” on our

graph?• Multimodal networks

– Integrate network information from several networks• Dynamic networks

– Understanding how network changes over time• Geographic visualization of network models

41

A Very Special Thanks to

• IPAM– Peter Jones– Mark Green– Russ Caflisch

• Colleagues and friends from Search Engines 2007

42

Additional thanks to

• Peter Broadwell, UCLA• James Abello, DIMACS• Tina Eliassi-Rad, LLNL/Rutgers• Nischal Devanur, Rutgers• UCLA’s Center for Digital Humanities

43

Funded by…– Nordic Council of Ministers– The American Council of Learned Societies– NSF Eager Grant IIS- 0970179

• With Lancaster (ECAI), Buckland (ECAI), Eliassi-Rad (Rutgers) and Faloutsos (CMU)

– Google Books Humanities grant– Many ideas derived from

• NEH Institute for Advanced Topics in Digital Humanities, “Networks and Network Analysis for the Humanities” (NEH HT5001609)

top related