LingSync Tutorial
LSA 2015 Annual Meeting
January 8
Joel Dunham
What & Why LingSync?
How to Use LingSync
Hands-on Practice
Features Being Developed
What is LingSync and why was it created?
lingsync.org
A Free Tool for Creating and Maintaining a Shared Database For Communities, Linguists and Language Learners
Joel Dunham
• PhD, UBC, 2014
• Online Linguistic Database (OLD)
• Morphological Parser Creator (MPC)
• Postdoc, Concordia
• Dative = LingSync + OLD + MPC
Fieldwork is hard.
Collaboration & computation
can make it less hard.
Endangered languages fieldwork
working with speakers to generate artifacts that encode knowledge of an endangered language
Why endangered languages fieldwork?
• document & preserve linguistic diversity
• revitalize tired & sleeping languages
• increase empirical base of linguistic science
• linguistic aptitude & expertise
• interpersonal skills
• balancing academic & community demands
• resource scarcity
Fieldwork is hard.Endangered languagesf er.
Fieldwork artifacts
• primary data
• higher-level artifacts
Primary Data
cniɬc xəxənɬʔupənkst
uɬ naqs spintk
transcriptions
Primary Data
cniɬc xəxənɬʔupənkst
uɬ naqs spintk
transcriptions
Primary Data
‘She is ninety-one years old.’
translations
cniɬc xəxənɬʔupənkst
uɬ naqs spintk
transcriptions
Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst
uɬ naqs s-pintk
3PRO=mouth nine-CC-ten
and one NOM-always
morphologicalanalyses/ IGT
{
{‘She is ninety-one years old.’
translations
cniɬc xəxənɬʔupənkst
uɬ naqs spintk
transcriptions
morphemes/ lexical entries
Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst
uɬ naqs s-pintk
3PRO=mouth nine-CC-ten
and one NOM-always
morphologicalanalyses/ IGT
{
{‘She is ninety-one years old.’
translations
audiocniɬc xəxənɬʔupənkst
uɬ naqs spintk
transcriptions
morphemes/ lexical entries
Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst
uɬ naqs s-pintk
3PRO=mouth nine-CC-ten
and one NOM-always
morphologicalanalyses/ IGT
{
{‘She is ninety-one years old.’
translations
audiocniɬc xəxənɬʔupənkst
uɬ naqs spintk
transcriptions
morphemes/ lexical entries
Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst
uɬ naqs s-pintk
3PRO=mouth nine-CC-ten
and one NOM-always
morphologicalanalyses/ IGT
{
{
images
‘She is ninety-one years old.’
translations
audiocniɬc xəxənɬʔupənkst
uɬ naqs spintk
transcriptions
morphemes/ lexical entries
Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst
uɬ naqs s-pintk
3PRO=mouth nine-CC-ten
and one NOM-always
morphologicalanalyses/ IGT
{
{
images
consultant: SP
elicitor: XY
date elicited: 09/11/2002
... ...{
metadata‘She is ninety-one years old.’
translations
audiocniɬc xəxənɬʔupənkst
uɬ naqs spintk
transcriptions
morphemes/ lexical entries
Primary Datacniɬ=c xəxnut-ɬ-ʔupənkst
uɬ naqs s-pintk
3PRO=mouth nine-CC-ten
and one NOM-always
morphologicalanalyses/ IGT
{
{
images
consultant: SP
elicitor: XY
date elicited: 09/11/2002
... ...{
metadata
analyses
‘She is ninety-one years old.’
translations
Higher-level artifactsgrammars
data
dictionaries
data
research papers
datapedagogical materials
data
grammars
data
dictionaries
data
research papers
datapedagogical materials
data
Primary data flowis
grammars
data
dictionaries
data
research papers
datapedagogical materials
data
Primary data flowoughtis
grammars
data
dictionaries
data
research papers
datapedagogical materials
data
Primary data flowisought
grammars
data
dictionaries
data
research papers
datapedagogical materials
data
Primary data flowisought
grammars
data
dictionaries
research papers
datapedagogical materials
data
data
Primary data flowisought
dictionaries
pedagogical materials
grammars
research papers
LingSync
isought Primary data flow
Is having access to the data produced by one’s fellow fieldworkers really going to have a significant impact?
All 32 of British Columbia’s “First Nations languages are
critically endangered, if not sleeping already.”
- First Peoples’ Heritage, Language and Culture Council (2010, p. 22)
Data are scarce
• newspapers, novels, radio programs, etc. are extremely rare to nonexistent
• academic and community-based fieldwork generates the majority of data
Vision
• Imagine you had easy & unified access to:
• your own fieldwork data
• the data of your fieldworker peers (theoreticians, descriptivists, educators)
• the primary data underlying published sources
LingSync
grammars
dictionaries
descriptivelinguist
research papers
theoreticallinguist
pedagogical materials
languageteacher
developerlanguage-learningsoftware
Collaboration & data-sharing
Collaboration, data-sharing, & data reuse
• multi-user concurrently modifiable corpora√
Collaboration, data-sharing, & data reuse
• multi-user concurrently modifiable corpora√
• open source & free√
• cross-platform, browser-based GUIs√
Collaboration, data-sharing, & data reuse
• multi-user concurrently modifiable corpora√
• open source & free√
• cross-platform, browser-based GUIs√
• web service with consistent API
Collaboration, data-sharing, & data reuse
• multi-user concurrently modifiable corpora√
• open source & free√
√
• cross-platform, browser-based GUIs√
• web service with consistent API
• permissions
Collaboration, data-sharing, & data reuse
• multi-user concurrently modifiable corpora√
• open source & free√
√√
field-worker
LingSync corpus
field-worker
LingSync corpus
field-worker
LingSync corpus
field-worker
LingSync corpus
field-worker
LingSync corpus
field-worker
LingSync corpus
field-worker
LingSync corpus
read-only
field-worker
LingSync corpus
field-worker
LingSync corpus
field-worker
LingSync corpus
read-onlyread/write
field-worker
LingSync corpus
field-worker
LingSync corpus
field-worker
LingSync corpus
read-onlyread/write
Public
field-worker
LingSync corpus
field-worker
LingSync corpus
field-worker
LingSync corpus
read-onlyread/write
Publicencrypted bits
field-worker
LingSync corpus
field-worker
LingSync corpus v1
field-worker
LingSync corpus
read-onlyread/write
Publicencrypted bitsversioning
field-worker
LingSync corpus
field-worker
LingSync corpus v1
field-worker
LingSync corpus
read-onlyread/write
Publicencrypted bitsversioning
LingSync corpus v2
field-worker
LingSync corpus
field-worker
LingSync corpus v1
field-worker
LingSync corpus
read-onlyread/write
Publicencrypted bitsversioning
LingSync corpus v2
LingSync corpus v3
field-worker
LingSync corpus
field-worker
LingSync corpus v1
field-worker
LingSync corpus
read-onlyread/write
Publicencrypted bitsversioning
LingSync corpus v2
field-worker
LingSync corpus
field-worker
LingSync corpus v1
field-worker
LingSync corpus
read-onlyread/write
Publicencrypted bitsversioning
LingSync corpus v2
my other corpus
creator/admin
corpus Bcorpus A corpus C
session viii
session ix
session vi
session vii
sessioniv
session v
session i
session iii
session ii
• Alan Bale (Concordia)
• Gina Cook (iLanguage Lab Ltd)
• Jessica Coon (McGill)
• M.E. Cathcart (U Delaware)
• Theresa Deering (Visit Scotland, iLanguage Lab Ltd)
• Josh Horner (Amilia, iLanguage Lab Ltd)
• Yuliya Manyakina (McGill)
• Elise McClay (McGill)
• Gretchen McCulloch (McGill)
• Hisako Noguchi (Concordia)
• Michael Wagner (McGill)
• Jesse Pollak (Pomona College)
• Tobin Skinner (Acquisio, iLanguage Lab Ltd)
• Xianli Sun (Miami University)
• More...
Acknowledgements
References
• Beesley, K. R. and Karttunen, L. (2003). Finite State Morphology. Palo Alto, CA: CSLI Publications.
• Frantz, D. G. and Russell, N. J. 1995. Blackfoot Dictionary of Stems, Roots, and Affixes. Toronto: University of Toronto Press.
• Frantz, D. G. 1991. Blackfoot Grammar. Toronto: University of Toronto Press.
• Johnson, C. D. 1972. Formal aspects of phonological description. Mouton, The Hague.
• Karttunen, L., Kaplan, R. M., and Zaenen, A. 1992. Two-level morphology with composition. In Proceedings of the 14th Conference on Computational Linguistics, volume 1, pages 141–148. Association for Computational Linguistics.
• Lyon, J. 2013. Predication and Equation in Okanagan Salish: The Syntax and Semantics of Determiner Phrases. PhD dissertation, UBC.
• Mattina, A. 1973. Colville Grammatical Structure. PhD dissertation, University of Hawaii.
• Mattina, A. 1987. Colville-Okanagan Dictionary.
• Peterson, S. 2005. Captíkʷɬ 1: Okanagan Stories for Beginners. The Center for Interior Salish, The Paul Creek Language Association, and the Lower Similkameen Indian Band.
Images
• Bridget Holmes: a Nonagenarian Housemaid. John Riley. Public domain, via Wikimedia Commons
• WAV audio icon. By zeus (www.zeusbox.org) GPL (http://www.gnu.org/licenses/gpl.html)], via Wikimedia Commons