lingsync tutorial · joel dunham • phd, ubc, 2014 • online linguistic database (old) •...

Post on 16-Jul-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

LingSync Tutorial

LSA 2015 Annual Meeting

January 8

Joel Dunham

What & Why LingSync?

How to Use LingSync

Hands-on Practice

Features Being Developed

What is LingSync and why was it created?

lingsync.org

A Free Tool for Creating and Maintaining a Shared Database For Communities, Linguists and Language Learners

Joel Dunham

• PhD, UBC, 2014

• Online Linguistic Database (OLD)

• Morphological Parser Creator (MPC)

• Postdoc, Concordia

• Dative = LingSync + OLD + MPC

Fieldwork is hard.

Collaboration & computation

can make it less hard.

Endangered languages fieldwork

working with speakers to generate artifacts that encode knowledge of an endangered language

Why endangered languages fieldwork?

• document & preserve linguistic diversity

• revitalize tired & sleeping languages

• increase empirical base of linguistic science

• linguistic aptitude & expertise

• interpersonal skills

• balancing academic & community demands

• resource scarcity

Fieldwork is hard.Endangered languagesf er.

Fieldwork artifacts

• primary data

• higher-level artifacts

Primary Data

cniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

Primary Data

cniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

Primary Data

‘She is ninety-one years old.’

translations

cniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{‘She is ninety-one years old.’

translations

cniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

morphemes/ lexical entries

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{‘She is ninety-one years old.’

translations

audiocniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

morphemes/ lexical entries

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{‘She is ninety-one years old.’

translations

audiocniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

morphemes/ lexical entries

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{

images

‘She is ninety-one years old.’

translations

audiocniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

morphemes/ lexical entries

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{

images

consultant: SP

elicitor: XY

date elicited: 09/11/2002

... ...{

metadata‘She is ninety-one years old.’

translations

audiocniɬc xəxənɬʔupənkst

uɬ naqs spintk

transcriptions

morphemes/ lexical entries

Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst

uɬ naqs s-pintk

3PRO=mouth nine-CC-ten

and one NOM-always

morphologicalanalyses/ IGT

{

{

images

consultant: SP

elicitor: XY

date elicited: 09/11/2002

... ...{

metadata

analyses

‘She is ninety-one years old.’

translations

Higher-level artifactsgrammars

data

dictionaries

data

research papers

datapedagogical materials

data

grammars

data

dictionaries

data

research papers

datapedagogical materials

data

Primary data flowis

grammars

data

dictionaries

data

research papers

datapedagogical materials

data

Primary data flowoughtis

grammars

data

dictionaries

data

research papers

datapedagogical materials

data

Primary data flowisought

grammars

data

dictionaries

data

research papers

datapedagogical materials

data

Primary data flowisought

grammars

data

dictionaries

research papers

datapedagogical materials

data

data

Primary data flowisought

dictionaries

pedagogical materials

grammars

research papers

LingSync

isought Primary data flow

Is having access to the data produced by one’s fellow fieldworkers really going to have a significant impact?

All 32 of British Columbia’s “First Nations languages are

critically endangered, if not sleeping already.”

- First Peoples’ Heritage, Language and Culture Council (2010, p. 22)

Data are scarce

• newspapers, novels, radio programs, etc. are extremely rare to nonexistent

• academic and community-based fieldwork generates the majority of data

Vision

• Imagine you had easy & unified access to:

• your own fieldwork data

• the data of your fieldworker peers (theoreticians, descriptivists, educators)

• the primary data underlying published sources

LingSync

grammars

dictionaries

descriptivelinguist

research papers

theoreticallinguist

pedagogical materials

languageteacher

developerlanguage-learningsoftware

Collaboration & data-sharing

Collaboration, data-sharing, & data reuse

• multi-user concurrently modifiable corpora√

Collaboration, data-sharing, & data reuse

• multi-user concurrently modifiable corpora√

• open source & free√

• cross-platform, browser-based GUIs√

Collaboration, data-sharing, & data reuse

• multi-user concurrently modifiable corpora√

• open source & free√

• cross-platform, browser-based GUIs√

• web service with consistent API

Collaboration, data-sharing, & data reuse

• multi-user concurrently modifiable corpora√

• open source & free√

• cross-platform, browser-based GUIs√

• web service with consistent API

• permissions

Collaboration, data-sharing, & data reuse

• multi-user concurrently modifiable corpora√

• open source & free√

√√

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

read-only

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

read-onlyread/write

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

read-onlyread/write

Public

field-worker

LingSync corpus

field-worker

LingSync corpus

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bits

field-worker

LingSync corpus

field-worker

LingSync corpus v1

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bitsversioning

field-worker

LingSync corpus

field-worker

LingSync corpus v1

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bitsversioning

LingSync corpus v2

field-worker

LingSync corpus

field-worker

LingSync corpus v1

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bitsversioning

LingSync corpus v2

LingSync corpus v3

field-worker

LingSync corpus

field-worker

LingSync corpus v1

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bitsversioning

LingSync corpus v2

field-worker

LingSync corpus

field-worker

LingSync corpus v1

field-worker

LingSync corpus

read-onlyread/write

Publicencrypted bitsversioning

LingSync corpus v2

my other corpus

creator/admin

corpus Bcorpus A corpus C

session viii

session ix

session vi

session vii

sessioniv

session v

session i

session iii

session ii

• Alan Bale (Concordia)

• Gina Cook (iLanguage Lab Ltd)

• Jessica Coon (McGill)

• M.E. Cathcart (U Delaware)

• Theresa Deering (Visit Scotland, iLanguage Lab Ltd)

• Josh Horner (Amilia, iLanguage Lab Ltd)

• Yuliya Manyakina (McGill)

• Elise McClay (McGill)

• Gretchen McCulloch (McGill)

• Hisako Noguchi (Concordia)

• Michael Wagner (McGill)

• Jesse Pollak (Pomona College)

• Tobin Skinner (Acquisio, iLanguage Lab Ltd)

• Xianli Sun (Miami University)

• More...

Acknowledgements

References

• Beesley, K. R. and Karttunen, L. (2003). Finite State Morphology. Palo Alto, CA: CSLI Publications.

• Frantz, D. G. and Russell, N. J. 1995. Blackfoot Dictionary of Stems, Roots, and Affixes. Toronto: University of Toronto Press.

• Frantz, D. G. 1991. Blackfoot Grammar. Toronto: University of Toronto Press.

• Johnson, C. D. 1972. Formal aspects of phonological description. Mouton, The Hague.

• Karttunen, L., Kaplan, R. M., and Zaenen, A. 1992. Two-level morphology with composition. In Proceedings of the 14th Conference on Computational Linguistics, volume 1, pages 141–148. Association for Computational Linguistics.

• Lyon, J. 2013. Predication and Equation in Okanagan Salish: The Syntax and Semantics of Determiner Phrases. PhD dissertation, UBC.

• Mattina, A. 1973. Colville Grammatical Structure. PhD dissertation, University of Hawaii.

• Mattina, A. 1987. Colville-Okanagan Dictionary.

• Peterson, S. 2005. Captíkʷɬ 1: Okanagan Stories for Beginners. The Center for Interior Salish, The Paul Creek Language Association, and the Lower Similkameen Indian Band.

Images

• Bridget Holmes: a Nonagenarian Housemaid. John Riley. Public domain, via Wikimedia Commons

• WAV audio icon. By zeus (www.zeusbox.org) GPL (http://www.gnu.org/licenses/gpl.html)], via Wikimedia Commons

top related