old lithuanian digital: corpus of kristijonas donelaitis ... 2020/gelumbeckaite_cordon_2019.pdf ·...
TRANSCRIPT
![Page 1: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/1.jpg)
1 03.01.2020
Old Lithuanian Digital:
Corpus of Kristijonas Donelaitis (1714–1780) [CorDon]
Linguistic Annotation
Prof. Dr. Jolanta Gelumbeckaitė Institut für Empirische Sprachwissenschaft Goethe-Universität Frankfurt am Main eMail: [email protected]
MAKING OLD LITHUANIAN TEXTS USABLE FOR RESEARCH (November 28—29, 2019)
![Page 2: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/2.jpg)
2 03.01.2020
![Page 3: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/3.jpg)
3 03.01.2020
A comprehensive, deeply annotated diachronic reference corpus
of Old Lithuanian
Referenzcorpus Altlitauisch
Senosios lietuvių kalbos korpusas
(Lith. sliekas “earthworm”)
Cooperation:
2012–2014 (Nr. VAT-42/2012)
Goethe-University of Frankfurt am
Main
Institute of Lithuanian Language
Institute of Lithuanian
Literature and Folklore
University of Pisa
![Page 4: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/4.jpg)
4 03.01.2020
http://titus.fkidg1.uni-frankfurt.de/sliekkas/index.html
![Page 5: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/5.jpg)
5 03.01.2020
http://www-01.sil.org/iso639-3/codes.asp
![Page 6: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/6.jpg)
6 03.01.2020
Old Lithuanian (ca. 1520–1800)
“Dzūkian prayers” (Pater noster, Ave Maria, Credo) in: Nicolaus de Blony, Tractatus sacerdotalis
(Straßburg: Martin Flach, 1503)
Christian Gottlieb Mielcke (1732–1807), Anfangs=Gründe einer Littauischen Sprach=Lehre
(Königsberg: Hartungsche Hofbuchdruckerei, 1800)
Vilniaus universitetas, Sign.: VUB RS II–3006
ca. 10 million word tokens
![Page 7: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/7.jpg)
7 03.01.2020
http://titus.uni-frankfurt.de/texte/texte.htm
![Page 8: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/8.jpg)
8 03.01.2020
First Lithuanian text in TITUS
![Page 9: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/9.jpg)
9 03.01.2020
First Lithuanian text in TITUS
![Page 10: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/10.jpg)
10 03.01.2020
Institute of Lithuanian Literature and Folklore www.llti.lt
Kristijonas Donelaitis (1714–1780): PL, WD, F
Sign.: F1-5259 (Msc. A 120a-f. fol.)
![Page 11: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/11.jpg)
11 03.01.2020
DM 1765-1775: DM PL, DM WD, DMN RG, DMN ZR DMRh 1818 first edition by Ludwig J. Rhesa DMSch 1865 edition by August Schleicher DMN 1869 edition by Georg H. F. Nesselmann
![Page 12: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/12.jpg)
12 03.01.2020
DM F: Fortsetzung (29 verses) DPP: Pričkaus pasaka apie lietuvišką svodbą („Fritzens Erzählung von der litauischen Hochzeit“)
DP: Pasakos (fables): DP LG — Lapės ir gandro čėsnis („Gastmahl der Füchsin und des Storches“) DP RJ — Rudikis jomarkininks („Der Köter auf dem Jahrmarkt“) DP ŠD — Šuo didgalvis („Der großmaulige Hund“) DP PŠ — Pasaka apie šūdvabalį („Fabel vom Mistkäfer“) DP VP — Vilks provininks („Der Wolf als Richter“) DP ĄG — Ąžuols gyrpelnys („Der prahlerische Eichbaum“)
![Page 13: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/13.jpg)
13 03.01.2020
Ludwig J. Rhesa: DMRh 1818 first edition and translation into German of Metai; DPRh 1824 first edition of Pasakos (without translation) August Schleicher: DMSch, DPSch 1865 edition without translation; DPPSch 1865 first edition of Pričkaus pasaka Georg H. F. Nesselmann: DMN, DPN, DPPN 1869 edition and German translation. Nesselmann’s edition differs the least from the original.
![Page 14: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/14.jpg)
14 03.01.2020
Corpus’ Structure
o digitisation of the texts (and structural annotation)
o palaeographic resp. typographic and textological annotation
o lexical annotation:
• transliteration
• standardisation
• lemmatising
• glossing
o grammatical annotation
o annotation of quotations
o alignment of the annotated texts with facsimile reproductions
of the original, with each other, and with their translation
source texts (or translations into other languages)
![Page 15: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/15.jpg)
15 03.01.2020
Donelaitis 1977 = TITUS: ſʒŏkĭnėjánt
DM WD 16r 5(87): ſʒŏkĭnėjant
Donelaitis 1977 = TITUS: ſŭgăbe ſim
DM PL 7v 24(406): ſŭgăbe ſim
Digitisation
![Page 16: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/16.jpg)
17 03.01.2020
https://tla.mpi.nl/tools/tla-tools/elan/
https://tla.mpi.nl/tools/tla-tools/elan/
![Page 17: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/17.jpg)
18 03.01.2020
DM PL 10r 37(622)
Textological annotation
![Page 18: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/18.jpg)
19 03.01.2020
DM PL 10r 37(622)
DMRh 1818
![Page 19: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/19.jpg)
20 03.01.2020
• transliteration into Standard Lithuanian (in a historical lexicon;
phonotactic and orthographic pecularities)
• standardisation: normalised actual word form (in Standard Lithuanian;
common lexical/morphosyntactic base)
• lemmatising―main word form and its accentuation in a historical
lexicon
• glossing of the lemma in Lithuanian and in English (and/or German),
whereby its meanings in the given context are considered
• language encoding (olt, lat, ger, gre)
Lexical annotation
![Page 20: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/20.jpg)
21 03.01.2020
Jonas Kabelka, Kristijono Donelaičio raštų leksika, Vilnius: Mintis, 1964.
Lexical annotation
Georg H. F. Nesselmann (1851) Friedrich Kurschat (1883)
![Page 21: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/21.jpg)
22 03.01.2020
o hierarchic grammatical description, predominantly restricted to
morphology:
• part of speech-tagging:
POS-tagging of the lemma
POS-tagging of the actual word form
• morphological information:
unalterable morphological categories of the lemma
unalterable morphological categories of the actual word form:
flexional morphological characteristics of the actual word
form
Grammatical annotation
![Page 22: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/22.jpg)
23 03.01.2020
https://software.sil.org/toolbox/
• semi-automated (semi-manual) and human-controlled
annotation
• seven dictionaries are utilised in the Toolbox enviroment
• supplementation in the process of the annotation
![Page 26: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/26.jpg)
27 03.01.2020
Digitisation https://tla.mpi.nl/tools/tla-tools/elan/
![Page 27: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/27.jpg)
28 03.01.2020
The distinction of the grammatical class of the lemma and of the actual word
form in a given text enables us to indicate:
changes in the grammatical classes (nominalisation, adverbalisation, and
turning of some nouns into adpositions), e.g.: aukščiau (preposition
APPR) bambos (lemma: aukštai ADV)
Grammatical annotation
![Page 28: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/28.jpg)
29 03.01.2020
Donelaitis: <i > / <y> / <in>ti Kabelka 1964, LKŽ: i ti > yti 1977 = TITUS: yti, inti > inti
![Page 29: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/29.jpg)
30 03.01.2020
-(i)áus vs. -(i)aus
DM PL 5v 16–20(224–228)
![Page 30: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/30.jpg)
31 03.01.2020
-(i)áus vs. -(i)aus
![Page 31: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/31.jpg)
32 03.01.2020
• Kabelka 1964, LKŽ: paskiaus
• Nesselmann 1869: paskiáus
• DMN ZR 98 23–24(356–357)
-(i)áus vs. -(i)aus
![Page 32: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/32.jpg)
33 03.01.2020
• DMN PL 121 20(82)
-(i)áus vs. -(i)aus
![Page 33: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/33.jpg)
34 03.01.2020
• DM WD 15v 7–9(49–50)
-(i)áus vs. -(i)aus
![Page 34: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/34.jpg)
35 03.01.2020
DM WD 17v 20–24(231–235)
![Page 35: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/35.jpg)
37 03.01.2020
DM WD 17v 23(234)
• 1977, Kabelka 1964: rūpesčiu (ins.sg.)
• LKŽ:
![Page 36: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/36.jpg)
40 03.01.2020
„DIGITALE HUMANITIS“
![Page 37: Old Lithuanian Digital: Corpus of Kristijonas Donelaitis ... 2020/Gelumbeckaite_CorDon_2019.pdf · 03.01.2020 1 Old Lithuanian Digital: Corpus of Kristijonas Donelaitis (1714–1780)](https://reader033.vdocuments.net/reader033/viewer/2022041904/5e6266a72858a0748c6bb402/html5/thumbnails/37.jpg)
41 03.01.2020
Nuoširdžiai dėkoju Jums už dėmesį! Thank you very much for your attention!
Vielen herzlichen Dank für Ihre Aufmerksamkeit!
DM WD 23r 32(708)
DMN WD 708