nt2lex - university of rochestertetreaul/presentations-and-posters/... · 2018. 6. 16. · swedish...
Post on 10-Feb-2021
2 Views
Preview:
TRANSCRIPT
-
NT2LexA CEFR-Graded Lexical Resource for Dutch as a Foreign Language
Linked to Open Dutch WordNet
Anaïs Tack 1,2 Thomas François 1 Piet Desmet 2 Cédrick Fairon 11 CENTAL, Université catholique de Louvain, Louvain-la-Neuve, Belgium
2 ITEC, imec, KU Leuven Kulak, Kortrijk, Belgium
CEFR-GRADED LEXICONS
a graded lexicon is a lexical database that includes lexical
frequencies observed in texts graded along a difficulty scale
Foreign language (L2) materials
• textbooks and readers / learner texts
• CEFR scale [A1 > A2 > B1 > B2 > C1 > C2] (Council of Europe, 2001)
CEFRLex � cental.uclouvain.be/cefrlex/
ANALYSIS Semantics
ANALYSIS Frequency
KEY TAKEAWAYS
NT2Lex
�� a new resource for Dutch as a foreign language (NT2)
�� 17,743 entries with graded frequency distributions
�� measure of receptive word difficulty
�� measure of word sense complexity
through linkage to Open Dutch WordNet
� cental.uclouvain.be/nt2lex/
French - FLELex(François et al., 2014)
Swedish - SVALex(François et al., 2016)
English - EFLLex(Dürlich & François, 2018)
Swedish - SweLLex(Volodina et al., 2016)
ANALYSIS Psycholinguistics
NT2LEX
Online tools for lexical complexity analysis
• database search
• CEFR-based complex word identification (Tack et al., 2016)
Tools
Corpus of reading materials
• corpus of 461,088 tokens
• 5 CEFR levels (A1, A2, B1, B2, C1)
Preprocessing
• part-of-speech tagging with Frog (van den Bosch et al., 2007)
• SVM WSD tool trained on DutchSemCor (Vossen et al., 2012)
• linkage to Open Dutch WordNet (Postma et al., 2016)
Lexical frequencies
• lexical entries with per-level observed frequency
• normalised for lexical dispersion (Carroll et al., 1971)
ResourceNT2LEX
lemma pos sense synset A1 A2 B1 B2 C1pakkento grab
WW() pakken-v-1 odwn-10-101230891-v 35 117 101 5 -
pakkento defeat
WW() pakken-v-10 eng-30-01100145-v - 51 12 - -
zijnto exist
WW() zijn-v-1 eng-30-02603699-v 2,094 1,647 1,423 1,253 1,335
0 20 40 60 80frequency
0.0
0.2
0.4
0.6
0.8
1.0
disp
ersi
on
r2 = 0.83
frequency
• correlation Subtlex-NL (Keuleers et al., 2010)
• Zipfian effects
shorter = more frequent
dispersion
• theoretical familiarity
• more dispersed = basic voc
A1 A2 B1 B2 C1 TOTALlevel
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
poly
sem
es
semasiology
• form > meaning mappings
• easy = more polysemous
onomasiology
• meaning > form mappings
• lower degree of synonymy
• L2-specific lexicalisations
0 5 10 15 20age of acquisition
0.00
0.05
0.10
0.15
0.20
0.25
0.30
dens
ity
A1A2B1B2C1TOTAL
0 2 4 6concreteness
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
dens
ity
A1A2B1B2C1TOTAL
interplay of psycholinguistic norms (Brysbaert et al., 2014)
/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False
/CreateJDFFile false /Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ]>> setdistillerparams> setpagedevice
top related