lingsync tutorial · joel dunham • phd, ubc, 2014 • online linguistic database (old) •...

52
LingSync Tutorial LSA 2015 Annual Meeting January 8 Joel Dunham

Upload: others

Post on 16-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

LingSync Tutorial

LSA 2015 Annual Meeting

January 8

Joel Dunham

Page 2: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

What & Why LingSync?

How to Use LingSync

Hands-on Practice

Features Being Developed

Page 3: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

What is LingSync and why was it created?

Page 4: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

lingsync.org

A Free Tool for Creating and Maintaining a Shared Database For Communities, Linguists and Language Learners

Page 5: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Joel Dunham

• PhD, UBC, 2014

• Online Linguistic Database (OLD)

• Morphological Parser Creator (MPC)

• Postdoc, Concordia

• Dative = LingSync + OLD + MPC

Page 6: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Fieldwork is hard.

Collaboration & computation

can make it less hard.

Page 7: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Endangered languages fieldwork

working with speakers to generate artifacts that encode knowledge of an endangered language

Page 8: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Why endangered languages fieldwork?

• document & preserve linguistic diversity

• revitalize tired & sleeping languages

• increase empirical base of linguistic science

Page 9: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

• linguistic aptitude & expertise

• interpersonal skills

• balancing academic & community demands

• resource scarcity

Fieldwork is hard.Endangered languagesf er.

Page 10: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Fieldwork artifacts

• primary data

• higher-level artifacts

Page 11: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Primary Data

Page 12: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

cniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

Primary Data

Page 13: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

cniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

Primary Data

‘She is ninety-one years old.’

translations

Page 14: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

cniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{‘She is ninety-one years old.’

translations

Page 15: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

cniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

morphemes/ lexical entries

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{‘She is ninety-one years old.’

translations

Page 16: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

audiocniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

morphemes/ lexical entries

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{‘She is ninety-one years old.’

translations

Page 17: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

audiocniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

morphemes/ lexical entries

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{

images

‘She is ninety-one years old.’

translations

Page 18: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

audiocniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

morphemes/ lexical entries

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{

images

consultant: SP

elicitor: XY

date elicited: 09/11/2002

... ...{

metadata‘She is ninety-one years old.’

translations

Page 19: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

audiocniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

morphemes/ lexical entries

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{

images

consultant: SP

elicitor: XY

date elicited: 09/11/2002

... ...{

metadata

analyses

‘She is ninety-one years old.’

translations

Page 20: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Higher-level artifactsgrammars

data

dictionaries

data

research papers

datapedagogical materials

data

Page 21: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

grammars

data

dictionaries

data

research papers

datapedagogical materials

data

Primary data flowis

Page 22: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

grammars

data

dictionaries

data

research papers

datapedagogical materials

data

Primary data flowoughtis

Page 23: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

grammars

data

dictionaries

data

research papers

datapedagogical materials

data

Primary data flowisought

Page 24: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

grammars

data

dictionaries

data

research papers

datapedagogical materials

data

Primary data flowisought

Page 25: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

grammars

data

dictionaries

research papers

datapedagogical materials

data

data

Primary data flowisought

Page 26: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

dictionaries

pedagogical materials

grammars

research papers

LingSync

isought Primary data flow

Page 27: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Is having access to the data produced by one’s fellow fieldworkers really going to have a significant impact?

Page 28: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

All 32 of British Columbia’s “First Nations languages are

critically endangered, if not sleeping already.”

- First Peoples’ Heritage, Language and Culture Council (2010, p. 22)

Page 29: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Data are scarce

• newspapers, novels, radio programs, etc. are extremely rare to nonexistent

• academic and community-based fieldwork generates the majority of data

Page 30: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Vision

• Imagine you had easy & unified access to:

• your own fieldwork data

• the data of your fieldworker peers (theoreticians, descriptivists, educators)

• the primary data underlying published sources

Page 31: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

LingSync

grammars

dictionaries

descriptivelinguist

research papers

theoreticallinguist

pedagogical materials

languageteacher

developerlanguage-learningsoftware

Collaboration & data-sharing

Page 32: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Collaboration, data-sharing, & data reuse

• multi-user concurrently modifiable corpora√

Page 33: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Collaboration, data-sharing, & data reuse

• multi-user concurrently modifiable corpora√

• open source & free√

Page 34: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

• cross-platform, browser-based GUIs√

Collaboration, data-sharing, & data reuse

• multi-user concurrently modifiable corpora√

• open source & free√

Page 35: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

• cross-platform, browser-based GUIs√

• web service with consistent API

Collaboration, data-sharing, & data reuse

• multi-user concurrently modifiable corpora√

• open source & free√

Page 36: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

• cross-platform, browser-based GUIs√

• web service with consistent API

• permissions

Collaboration, data-sharing, & data reuse

• multi-user concurrently modifiable corpora√

• open source & free√

√√

Page 37: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

Page 38: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

Page 39: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

read-only

Page 40: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

read-onlyread/write

Page 41: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

read-onlyread/write

Public

Page 42: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bits

Page 43: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

field-worker

LingSync corpus v1

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bitsversioning

Page 44: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

field-worker

LingSync corpus v1

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bitsversioning

LingSync corpus v2

Page 45: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

field-worker

LingSync corpus v1

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bitsversioning

LingSync corpus v2

LingSync corpus v3

Page 46: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

field-worker

LingSync corpus v1

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bitsversioning

LingSync corpus v2

Page 47: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

field-worker

LingSync corpus

field-worker

LingSync corpus v1

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bitsversioning

LingSync corpus v2

my other corpus

Page 48: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

creator/admin

corpus Bcorpus A corpus C

session viii

session ix

session vi

session vii

sessioniv

session v

session i

session iii

session ii

Page 49: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD
Page 50: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

• Alan Bale (Concordia)

• Gina Cook (iLanguage Lab Ltd)

• Jessica Coon (McGill)

• M.E. Cathcart (U Delaware)

• Theresa Deering (Visit Scotland, iLanguage Lab Ltd)

• Josh Horner (Amilia, iLanguage Lab Ltd)

• Yuliya Manyakina (McGill)

• Elise McClay (McGill)

• Gretchen McCulloch (McGill)

• Hisako Noguchi (Concordia)

• Michael Wagner (McGill)

• Jesse Pollak (Pomona College)

• Tobin Skinner (Acquisio, iLanguage Lab Ltd)

• Xianli Sun (Miami University)

• More...

Acknowledgements

Page 51: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

References

• Beesley, K. R. and Karttunen, L. (2003). Finite State Morphology. Palo Alto, CA: CSLI Publications.

• Frantz, D. G. and Russell, N. J. 1995. Blackfoot Dictionary of Stems, Roots, and Affixes. Toronto: University of Toronto Press.

• Frantz, D. G. 1991. Blackfoot Grammar. Toronto: University of Toronto Press.

• Johnson, C. D. 1972. Formal aspects of phonological description. Mouton, The Hague.

• Karttunen, L., Kaplan, R. M., and Zaenen, A. 1992. Two-level morphology with composition. In Proceedings of the 14th Conference on Computational Linguistics, volume 1, pages 141–148. Association for Computational Linguistics.

• Lyon, J. 2013. Predication and Equation in Okanagan Salish: The Syntax and Semantics of Determiner Phrases. PhD dissertation, UBC.

• Mattina, A. 1973. Colville Grammatical Structure. PhD dissertation, University of Hawaii.

• Mattina, A. 1987. Colville-Okanagan Dictionary.

• Peterson, S. 2005. Captíkʷɬ 1: Okanagan Stories for Beginners. The Center for Interior Salish, The Paul Creek Language Association, and the Lower Similkameen Indian Band.

Page 52: LingSync Tutorial · Joel Dunham • PhD, UBC, 2014 • Online Linguistic Database (OLD) • Morphological Parser Creator (MPC) • Postdoc, Concordia • Dative = LingSync + OLD

Images

• Bridget Holmes: a Nonagenarian Housemaid. John Riley. Public domain, via Wikimedia Commons

• WAV audio icon. By zeus (www.zeusbox.org) GPL (http://www.gnu.org/licenses/gpl.html)], via Wikimedia Commons