![Page 1: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/1.jpg)
IntroductionConstruction
DatabaseApplications
The Geneva Corpus of Middle English Poetry:its construction and possible applications
Richard Zimmermann
Universite de Geneve
November 16, 2013
![Page 2: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/2.jpg)
IntroductionConstruction
DatabaseApplications
Outline
IntroductionWhat is the GeCMEP?Why is the GeCMEP useful?
ConstructionStep 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Database
ApplicationsExample 1: Verb - Object orderExample 2: Th and Wh elements
![Page 3: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/3.jpg)
IntroductionConstruction
DatabaseApplications
What is the GeCMEP?Why is the GeCMEP useful?
GeCMEP - overview
I The Geneva Corpus of Middle English Poetry (GeCMEP) is afully annotated and syntactically parsed corpus.
I Time: 1150-1420 (Helsinki periods M1, M2, M3)
I Size: goal is 100,000 words before end of PhD, but inprinciple open-ended
I Parsed according to the rules of the Penn Parsed Corpus ofMiddle English (Kroch and Taylor 2000)
Go to the PPCM2 Manual
![Page 4: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/4.jpg)
IntroductionConstruction
DatabaseApplications
What is the GeCMEP?Why is the GeCMEP useful?
GeCMEP - overview
I The Geneva Corpus of Middle English Poetry (GeCMEP) is afully annotated and syntactically parsed corpus.
I Time: 1150-1420 (Helsinki periods M1, M2, M3)
I Size: goal is 100,000 words before end of PhD, but inprinciple open-ended
I Parsed according to the rules of the Penn Parsed Corpus ofMiddle English (Kroch and Taylor 2000)
Go to the PPCM2 Manual
![Page 5: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/5.jpg)
IntroductionConstruction
DatabaseApplications
What is the GeCMEP?Why is the GeCMEP useful?
GeCMEP - overview
I The Geneva Corpus of Middle English Poetry (GeCMEP) is afully annotated and syntactically parsed corpus.
I Time: 1150-1420 (Helsinki periods M1, M2, M3)
I Size: goal is 100,000 words before end of PhD, but inprinciple open-ended
I Parsed according to the rules of the Penn Parsed Corpus ofMiddle English (Kroch and Taylor 2000)
Go to the PPCM2 Manual
![Page 6: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/6.jpg)
IntroductionConstruction
DatabaseApplications
What is the GeCMEP?Why is the GeCMEP useful?
GeCMEP - overview
I The Geneva Corpus of Middle English Poetry (GeCMEP) is afully annotated and syntactically parsed corpus.
I Time: 1150-1420 (Helsinki periods M1, M2, M3)
I Size: goal is 100,000 words before end of PhD, but inprinciple open-ended
I Parsed according to the rules of the Penn Parsed Corpus ofMiddle English (Kroch and Taylor 2000)
Go to the PPCM2 Manual
![Page 7: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/7.jpg)
IntroductionConstruction
DatabaseApplications
What is the GeCMEP?Why is the GeCMEP useful?
Example parse
(1) hwawho
swaso
nenot
forZefeDforgives
heoretheir
hating,hating,
neno
godGod
nenot
forZeueDforgives
himthem
nano
þingthing
’Whoever doesn’t forgive their hate, God will not forgive them anything’(PatNost,111.67.220)
![Page 8: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/8.jpg)
IntroductionConstruction
DatabaseApplications
What is the GeCMEP?Why is the GeCMEP useful?
Major Middle English Prose Texts 1100-1400
→ ME verse can help to close the prose gap c. 1250-1350
![Page 9: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/9.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Creation of a basic text file I
I Find an electronic version of thetext you want to parse
I Example: Morris (1972) An OldEnglish Miscellany Containing aBestiary
Go to Archive.org
I Copy and paste into a .txt filewith ANSI formatting so thatspecial characters like thorn,yogh etc. can be read
Open Bestiary.txt
Some online resources...
![Page 10: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/10.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Creation of a basic text file I
I Find an electronic version of thetext you want to parse
I Example: Morris (1972) An OldEnglish Miscellany Containing aBestiary
Go to Archive.org
I Copy and paste into a .txt filewith ANSI formatting so thatspecial characters like thorn,yogh etc. can be read
Open Bestiary.txt
Some online resources...
![Page 11: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/11.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Creation of a basic text file I
I Find an electronic version of thetext you want to parse
I Example: Morris (1972) An OldEnglish Miscellany Containing aBestiary
Go to Archive.org
I Copy and paste into a .txt filewith ANSI formatting so thatspecial characters like thorn,yogh etc. can be read
Open Bestiary.txt
Some online resources...
![Page 12: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/12.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Creation of a basic text file II
I Remove mark-ups, comments,footnotes, page & folionumbers, translations, criticalapparatus etc. (UTF-8)
I Replace special characters, e.g.þ → +t, Z → +g etc.
Open Bestiary2.txt
I Reformat the file such thatthere is only one word per line
Open Bestiary3.txt
Replacement of D inNotepad++
![Page 13: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/13.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Creation of a basic text file II
I Remove mark-ups, comments,footnotes, page & folionumbers, translations, criticalapparatus etc. (UTF-8)
I Replace special characters, e.g.þ → +t, Z → +g etc.
Open Bestiary2.txt
I Reformat the file such thatthere is only one word per line
Open Bestiary3.txt
Replacement of D inNotepad++
![Page 14: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/14.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Creation of a basic text file II
I Remove mark-ups, comments,footnotes, page & folionumbers, translations, criticalapparatus etc. (UTF-8)
I Replace special characters, e.g.þ → +t, Z → +g etc.
Open Bestiary2.txt
I Reformat the file such thatthere is only one word per line
Open Bestiary3.txt
Replacement of D inNotepad++
![Page 15: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/15.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Part-of-Speech Annotation
I POS-annotation with TreeTagger (Schmid 1995)
I takes a word and assigns to it aPOS-tag and a lemma
I GeCMEP does not includelemmas; lemmas not used
I supervised tagger; traininglexicon and training input takenfrom PPCME2 prose texts
I accuracy: c. 85% of all tags areassigned correctly
Example output ofTreeTagger
![Page 16: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/16.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Part-of-Speech Annotation
I POS-annotation with TreeTagger (Schmid 1995)
I takes a word and assigns to it aPOS-tag and a lemma
I GeCMEP does not includelemmas; lemmas not used
I supervised tagger; traininglexicon and training input takenfrom PPCME2 prose texts
I accuracy: c. 85% of all tags areassigned correctly
Example output ofTreeTagger
![Page 17: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/17.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Part-of-Speech Annotation
I POS-annotation with TreeTagger (Schmid 1995)
I takes a word and assigns to it aPOS-tag and a lemma
I GeCMEP does not includelemmas; lemmas not used
I supervised tagger; traininglexicon and training input takenfrom PPCME2 prose texts
I accuracy: c. 85% of all tags areassigned correctly
Example output ofTreeTagger
![Page 18: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/18.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Part-of-Speech Annotation
I POS-annotation with TreeTagger (Schmid 1995)
I takes a word and assigns to it aPOS-tag and a lemma
I GeCMEP does not includelemmas; lemmas not used
I supervised tagger; traininglexicon and training input takenfrom PPCME2 prose texts
I accuracy: c. 85% of all tags areassigned correctly
Example output ofTreeTagger
![Page 19: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/19.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Part-of-Speech Annotation
I POS-annotation with TreeTagger (Schmid 1995)
I takes a word and assigns to it aPOS-tag and a lemma
I GeCMEP does not includelemmas; lemmas not used
I supervised tagger; traininglexicon and training input takenfrom PPCME2 prose texts
I accuracy: c. 85% of all tags areassigned correctly
Example output ofTreeTagger
![Page 20: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/20.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Part-of-Speech Annotation
Demonstration
![Page 21: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/21.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Tokenization
I sentence boundaries and IDs inserted manually in aspreadsheet
![Page 22: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/22.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
Shallow parsing procedure
I shallow parsing builds simple syntactic structures with regularexpressions (Abney 1991)
I e.g. prepositional phrases can be build with an instruction like”whenever there is a P immediately before an NP, thenbracket them together into a PP”
![Page 23: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/23.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
CorpusSearch revision queries
I tokens are chunked with revision queries of CorpusSearch 2(Randall 2004)
Example revision querydefine: cs.defnode: IP*|CP*copy corpus: tquery: (IP*|CP* idoms {1}P)AND (IP*|CP* idoms {2}NP)AND (P iprecedes NP)
add internal node{1,2}: PP
I windows bat file runs very many of such revision queries;output of one query feeds into the next
I simple python script converts tokens into right input formatI then manual correction until all tokens correspond to
PPCME2 guidelines
![Page 24: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/24.jpg)
IntroductionConstruction
DatabaseApplications
Step 1: PreprocessingStep 2: POS-taggingStep 3: Chunking
CorpusSearch revision queries
Demonstration
![Page 25: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/25.jpg)
IntroductionConstruction
DatabaseApplications
Online Documentation for the GeCMEP
I Currently the database includes information on 15 parsed textfiles and a general bibliography
I Text information specifies factors that might determinesyntactic variation: date of composition, date of manuscript,dialect, versification, literary subjects
I In addition, cross-references to the three standard cataloguesof ME (verse) texts: IMEV, Wells and MEC.
Go to the GeCMEP Database
![Page 26: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/26.jpg)
IntroductionConstruction
DatabaseApplications
Online Documentation for the GeCMEP
I Currently the database includes information on 15 parsed textfiles and a general bibliography
I Text information specifies factors that might determinesyntactic variation: date of composition, date of manuscript,dialect, versification, literary subjects
I In addition, cross-references to the three standard cataloguesof ME (verse) texts: IMEV, Wells and MEC.
Go to the GeCMEP Database
![Page 27: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/27.jpg)
IntroductionConstruction
DatabaseApplications
Online Documentation for the GeCMEP
I Currently the database includes information on 15 parsed textfiles and a general bibliography
I Text information specifies factors that might determinesyntactic variation: date of composition, date of manuscript,dialect, versification, literary subjects
I In addition, cross-references to the three standard cataloguesof ME (verse) texts: IMEV, Wells and MEC.
Go to the GeCMEP Database
![Page 28: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/28.jpg)
IntroductionConstruction
DatabaseApplications
Online Documentation for the GeCMEP
I Currently the database includes information on 15 parsed textfiles and a general bibliography
I Text information specifies factors that might determinesyntactic variation: date of composition, date of manuscript,dialect, versification, literary subjects
I In addition, cross-references to the three standard cataloguesof ME (verse) texts: IMEV, Wells and MEC.
Go to the GeCMEP Database
![Page 29: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/29.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Searching the corpus
I the corpus can be searched with query files of CorpusSearch 2(Randall 2004)
I all functional labels, POS-tags and specific spellings can besearched for
I complex relations between elements can be specified (relativeorder of elements, number of words, identical indices, ...)
![Page 30: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/30.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Searching the corpus
I the corpus can be searched with query files of CorpusSearch 2(Randall 2004)
I all functional labels, POS-tags and specific spellings can besearched for
I complex relations between elements can be specified (relativeorder of elements, number of words, identical indices, ...)
![Page 31: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/31.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Searching the corpus
I the corpus can be searched with query files of CorpusSearch 2(Randall 2004)
I all functional labels, POS-tags and specific spellings can besearched for
I complex relations between elements can be specified (relativeorder of elements, number of words, identical indices, ...)
![Page 32: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/32.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Realization of verb-pronoun structures
(2) a. þethe
þurstthirst
himhim
dededid
moremore
wowoe
þenthan
hevedehad
raþerearlier
hishis
houngerhunger
do.done
’The thirst caused him more misery than his houngerhad done before’ (FoxWolf,56.273.68)
b. þethe
woxfox
bichardedeceived
him,him,
midwith
iwisse,certainity,
Forfor
hehe
nenot
fondfound
nonesnone
kunneskind’s
blissebliss
’The fox had deceived him indeed, because he didn’tfind any kind of bliss’ (FoxWolf,224.278.295)
![Page 33: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/33.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Search queries for verb-pronoun structures
I simple CorpusSearch query files to find such constructions:
opro-V.q
define: cs.defnode: IP*query: (IP* idoms NP-OB*)AND (NP-OB* idomsonly PRO)AND (IP* idomsVBP|VBD|DOP|DOD|HVP|HVD)
AND (NP-OB* precedes
VBP|VBD|DOP|DOD|HVP|HVD)
V-opro.q
define: cs.defnode: IP*query: (IP* idoms NP-OB*)AND (NP-OB* idomsonly PRO)AND (IP* idomsVBP|VBD|DOP|DOD|HVP|HVD)
AND
(VBP|VBD|DOP|DOD|HVP|HVD
precedes NP-OB*)
![Page 34: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/34.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Search queries for verb-pronoun structures
Demonstration
![Page 35: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/35.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Development of verb-pronoun structures
→ Significant decline of object pronoun - verb orders (c. 80%to c. 40%) measurable in Middle English verse 1150-1400
![Page 36: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/36.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Benefits
I Collecting data from parsed corpora with automated searchqueries ...
I ... saves a lot of time. Going through tens of thousands ofwords manually takes weeks or even months - with parsedcorpora it’s a matter of seconds.
I ... causes fewer mistakes. Humans can easily overlookexamples or tally them up wrong - computers count correctly.
I ... assures replicability. Researchers can quickly double-checkquantitative claims in the literature.
I ... increases objectivity. Search criteria must be made explicitin the query files hence the ex-/inclusion of a particularsentence is independent of the individual researcher.
→ increased scientificity
![Page 37: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/37.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Benefits
I Collecting data from parsed corpora with automated searchqueries ...
I ... saves a lot of time. Going through tens of thousands ofwords manually takes weeks or even months - with parsedcorpora it’s a matter of seconds.
I ... causes fewer mistakes. Humans can easily overlookexamples or tally them up wrong - computers count correctly.
I ... assures replicability. Researchers can quickly double-checkquantitative claims in the literature.
I ... increases objectivity. Search criteria must be made explicitin the query files hence the ex-/inclusion of a particularsentence is independent of the individual researcher.
→ increased scientificity
![Page 38: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/38.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Benefits
I Collecting data from parsed corpora with automated searchqueries ...
I ... saves a lot of time. Going through tens of thousands ofwords manually takes weeks or even months - with parsedcorpora it’s a matter of seconds.
I ... causes fewer mistakes. Humans can easily overlookexamples or tally them up wrong - computers count correctly.
I ... assures replicability. Researchers can quickly double-checkquantitative claims in the literature.
I ... increases objectivity. Search criteria must be made explicitin the query files hence the ex-/inclusion of a particularsentence is independent of the individual researcher.
→ increased scientificity
![Page 39: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/39.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Benefits
I Collecting data from parsed corpora with automated searchqueries ...
I ... saves a lot of time. Going through tens of thousands ofwords manually takes weeks or even months - with parsedcorpora it’s a matter of seconds.
I ... causes fewer mistakes. Humans can easily overlookexamples or tally them up wrong - computers count correctly.
I ... assures replicability. Researchers can quickly double-checkquantitative claims in the literature.
I ... increases objectivity. Search criteria must be made explicitin the query files hence the ex-/inclusion of a particularsentence is independent of the individual researcher.
→ increased scientificity
![Page 40: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/40.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Benefits
I Collecting data from parsed corpora with automated searchqueries ...
I ... saves a lot of time. Going through tens of thousands ofwords manually takes weeks or even months - with parsedcorpora it’s a matter of seconds.
I ... causes fewer mistakes. Humans can easily overlookexamples or tally them up wrong - computers count correctly.
I ... assures replicability. Researchers can quickly double-checkquantitative claims in the literature.
I ... increases objectivity. Search criteria must be made explicitin the query files hence the ex-/inclusion of a particularsentence is independent of the individual researcher.
→ increased scientificity
![Page 41: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/41.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Locative relatives: there → where ...
(3) a. hishis
halieholy
nomename
wewe
nomentook
andand
beren.bore
Inin
þethe
fontfont
þerwhere
wewe
iclensedcleansed
weren.were
’We took and bore his name in the font where we were cleansed.’(PatNost,36.59.70)
b. Toto
OxenfordOxford
ishis
messagerismessengers
hehe
sende,sent,
thatthat
hithey
soghtesought
Thisthis
maidemaid
warewhere
heoshe
werewere
ifoundefound
andand
toto
himhim
broghte.brought
He sent his messengers to Oxford so that they might seek and findthis maiden, where she was, and bring her to him.’ (Fridesw,47.64)
![Page 42: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/42.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
and temporal subordinators: then → when
(4) a. Acbut
ureour
drihtenLord
eftagain
ofof
deaþedeath
hemthem
aræreþ,arises
Soas
hehe
alleall
menmen
deþ,does
þonnewhen
domes daiDoomsday
cumeþ.comes
’But our Lord will raise them up from death, as he willall men, when Doomsday comes.’(BodySoul,185.7.13.FragE)
b. andand
alall
bi-fuliþbefouls
hehe
hishis
frendfriend
hwenwhen
hehe
himhim
vnfoldiþ.embraces
’He wholly befouls his friend when he embraces him.’(ProvAlf,224.50.659.B32)
![Page 43: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/43.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Development of Th and Wh elements
→ Significant increase of Wh- relative and adverbial clauses(c. 10% to c. 70%) in Middle English verse 1150-1400
![Page 44: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/44.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
Comparison verse - prose ...
![Page 45: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/45.jpg)
IntroductionConstruction
DatabaseApplications
Example 1: Verb - Object orderExample 2: Th and Wh elements
... shows that verse can close the gap in prose texts.
![Page 46: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/46.jpg)
IntroductionConstruction
DatabaseApplications
Special thanks to Beatrice Santorini for running her queries on theGeCMEP files to find annotation mistakes. Thanks are also due to BenjaminBorschinger and Paola Merlo for useful advice on chunking.
![Page 47: The Geneva Corpus of Middle English Poetry: its construction and possible applications documents/CUSODigAge_presentation.pdf · 2014-03-13 · Introduction Construction Database Applications](https://reader035.vdocuments.net/reader035/viewer/2022070718/5edde961ad6a402d6669252b/html5/thumbnails/47.jpg)
IntroductionConstruction
DatabaseApplications
Abney, S. (1991), Parsing by chunks, in R. Berwick and C. Tenny,eds, ‘Principle-Based Parsing’, Kluwer Academic Publishers,Dordrecht, pp. 257–278.
Kroch, A. and Taylor, A. (2000), Penn-Helsinki Parsed Corpus ofMiddle English,http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-3(Accessed 10 April 2013), 2 edn, Department of Linguistics,University of Pennsylvania.
Randall, B. (2004), CorpusSearch 2,http://corpussearch.sourceforge.net/ (Accessed 7 November2013), Sourceforge.net.
Schmid, H. (1995), ‘Improvements in part-of-speech tagging withan application to german’, Proceedings of the ACLSIGDAT-Workshop. Dublin, Ireland pp. 47–50.