bon cop, bad cop a tale of two...

40
Bon cop, bad cop: A tale of two cities Kelsey Ball, Barbara E. Bullock, Gualberto Guzmán, Rozen Neupane, Kristopher S. Novak, & Jacqueline Larsen Serigos The University of Texas Transcultural urban spaces: where geography meets language 16/17 October 2015 Berne

Upload: ngothuy

Post on 16-Apr-2018

235 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Bon cop, bad cop: A tale of two cities

Kelsey Ball, Barbara E. Bullock, Gualberto Guzmán,Rozen Neupane, Kristopher S. Novak,

& Jacqueline Larsen SerigosThe University of Texas

Transcultural urban spaces: where geography meets language 16/17 October 2015 Berne

Page 2: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Outline1. Language, geography, contact, and multilingualism

○ Mobility and mixing

2. The Bilingual Annotation Tasks (BATs) Force○ Investigating multilingual texts with automated tools

3. Significance○ Insight into how bilingual speakers use their linguistic resources as they move between different

urban spaces

2

Page 3: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Language, geography, contact, and multilingualism

3

Page 4: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Superdiversity

● In zones of high contact, such as urban centers, innovative language use is ubiquitous, reflective of superdiversity (Vertovec 2007).

● Contemporary urban speech characterized by mixing is referred to by a variety of terms:

○ urban youth languages

○ ethnolects

○ translanguaging/code-switching

○ hybrid language practices

4

Page 5: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Language mixing and multilingual corpora

● As speakers move between and within spaces, interacting with different interlocutors, they move between and within languages.

● What is the nature of contemporary mixed language forms?○ Linguistic studies : How do they operate structurally?

○ Cultural and urban studies: What kinds of meanings and identities do they convey?

5

Page 6: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Polylingual performances

6

● Not limited to borrowing words from Language X to insert into Language Y; can include:

○ Non-normative usages

○ Local forms

○ Polylingual language play: memes

○ Convergence in structures and meaning between languages

● In order to make sense of this, we need data—lots of it

● But these features make it difficult to annotate these types of text

Page 7: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Attempts to study multilingual data

● LIDES/LIPPS Group

● BilingBank

○ accessible via http://www.talkbank.org/

● Contacts de Langues: Analyses Plurifactorielles (CLAPOTY) ○ Isabelle Léglise (CNRS SEDYL/CELIA) - http://clapoty.vjf.cnrs.fr/index.html

7

Page 8: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Studying multilingual data: Traditionally● typically manually annotated● set base/matrix language● discrete categorization of contact features

○ Established borrowings (vs. single word code-switch)

■ “mon chum”

○ Syntactic calques (vs. a dialectal extensions of existing patterns in a language)

■ “take a coffee” < have coffee

○ The language of a proper name (Named Entity)

■ Robert

8

Page 9: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Studying multilingual data: More recently● Probabilistic approaches to contact features

○ Allows for gradient/blended classification

● Flexible word sequences (ngrams)

● Automatic annotation

9

Page 10: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Why automate?● Eliminate obstacles of cost and time● Eliminate errors/biases introduced by manual coding● Increase accountability through replicability● Process other people’s data to create open resources● Accelerate production of findings● Encourage collaboration

10

Page 11: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Bilingual Annotation Tasks (BATs) Force/|\ (°v°)/|\

11

Page 12: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Goals of the BATs● Present a method and tools for analyzing larger data sets than linguists usually

work with

● Demonstrate that it is possible to automatically identify the languages in a data set without the need for human intervention

● Apply a quick metric to calculate the degree to which a text is mixed so that it can be compared with other texts

12

Page 13: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

13

A first step: written corpus

● Killer Crónicas: Bilingual Memories

○ ~49,000 words

○ colloquial language

○ dialectal forms

○ highly bilingual

○ extensive mixing

Page 14: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

A second step: oral corpus

● Bon Cop Bad Cop

○ 13,502 tokens

○ colloquial language

○ dialectal forms

○ extensive mixing

○ highly bilingual

14

Page 15: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Movie Clip

15https://www.youtube.com/watch?v=lLISTwHX88k&feature=youtu.be

Page 16: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Bon cop, Bad Cop, 2006● Plot

○ Officers from Ontario and Quebec join forces to solve a murder on the border

○ Film revolves mixing of languages and cultures

■ Bouchard: Francophone

■ Ward: Anglophone

● Bilingualism○ Touted as Canada’s first bilingual film, very successful at the box office

○ Filmed with a French and an English script; later edited for final mixed version

16

Page 17: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Methods

17

Page 18: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Corpus Creation1. Monolingual subtitles: open subtitle.com (English), subsmax. com (French)

18

Page 19: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Corpus Creation2. Manually combined the two subtitles to create a transcript

19

Page 20: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Annotation Tags for Manual Gold Standard

20

Language Tag Example

English years, country

Non-standard English Hopportunity

French frère, ville

Non-standard French mononcle, tabarnak

Named Entities Montréal, Toronto, David

Punctuation . ! , ? –

Number 2001, VII

Page 21: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Example of the Gold Standard

21

Page 22: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Building a language identification model● Numbers and Punctuation

○ Rule-based

● Named Entity○ Stanford Named Entity Recognizer

● English and French ○ Character Ngram

○ HMM

22

Page 23: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Character N-gram Model● What’s the probability that a given word is English? French?● Model a language using training data (La Presse corpus, CALLHOME corpus)● Identify language of an unknown word based on how often it occurs in training

data● Example:

23

Page 24: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Hidden Markov Model● What’s the probability that this word is English, considering that the previous

word was English?● Code-switching is less frequent (therefore less probable) than not switching.

-Me occurs in both English and French training data → ambiguous -Because previous word was English, me is more likely to be English

24

Page 25: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Annotation of Bon Cop, Bad Cop by model

25

Page 26: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Results

26

Page 27: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Evaluation of our model

27

Tag Accuracy Precision Recall

English 98.3% 98.3% 96.6%

French 97.5% 97.4% 98.6%

Named Entity 69.4% 78.7% 85.5%

Over-all 97.0%

Page 28: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Over-all Language Distribution across Bon Cop Bad Cop

28

Language Tag

Named Ent.

Punct.

Eng

French

Token count

Page 29: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Characters’ Usage of French Across Cities

29Montréal Toronto

Page 30: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Logistic Regression

30

● Relative to Martin Ward...

○ Others: not a significant predictor

○ David Bouchard: 15 times more likely to use French

○ Tattoo Killer: 1.6 times more likely to use English

● Relative to Toronto...

○ In Montréal: 8 times more likely to use French

Can we predict language based on character and location?

Page 31: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Bilingual metric applied to Bon Cop Bad Cop

31

● The M-metric is developed by the LIPPS Group (2000):

where,

k=total number of languages/categories

pj=total number of words in the given language over total words in the corpus

● The metric allows us to determine how bilingual a text is: the closer the value is

to 1, the more ‘bilingual’ the text

Page 32: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

M-metric for Bon Cop Bad Cop

32

Page 33: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Switch points

33

Types of Switches Example

Eng to French He has had an unfortunate contretemps

French to English On peut aller demander à ses partners

Punct. only Hey, where you going?Chez nous.

Named Ent. and Punct. Bye, Patrick. Merci d'avoir appelé.

Page 34: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Switch Pattern

34

Over ⅔ of Direct switches involved single word borrowings (eg. ses affaires cheap américaines)

Direct Switches

Indirect Switches

Indirect Switches

Page 35: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

Discussion and Conclusions

35

Page 36: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

What we’ve achieved● Presented a profile of language switching in a large bilingual text

○ locale

○ speaker

● Demonstrated that it is possible to automatically identify language switches without manually coding all the data

○ with a high degree of accuracy: 97%

● Calculated the degree to which segments of the corpus are mixed and how they are mixed

○ M score of .71 for overall mixing where 1 = perfectly mixed

36

Page 37: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

What we’ve learned about who switches, where

● Context matters— a lot

■ In Toronto, only the Québécois cop (Bouchard) uses French; all others perform in English

■ In Montreal, the Torontonian cop (Ward) and Tattoo Killer perform bilingually

● Speaker matters— a lot

■ The most bilingual character appears to be Tattoo Killer

■ The least is Bouchard

37

Page 38: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

What else we’ve learned about who switches● The anglophone speakers accommodate more to the francophone than vice

versa.○ Given the politics of language in Quebec with regard to French language maintenance, and the

fact that the director and producer of the film are Québécois, this might not be surprising.

38

Page 39: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

What we’ve learned about how they switch

● Most switching coincides with punctuation, indicating

○ a change in speaker turn

○ or a switch at a phrasal boundary

● Within a phrase○ nearly balanced French↔English

39

Page 40: Bon cop, bad cop A tale of two citiessites.utexas.edu/.../2016/02/Bon-cop-bad-cop-A-tale-of-two-cities.pdf · Bon cop, bad cop: A tale of two cities ... open subtitle.com (English),

ConclusionsThis work shows how NLP tools can be fruitfully recruited to shed light on contemporary mixed language use.

40