shyrokov volodymyr, bugakov oleg krygin maxim, sydorchuk nadiia ukrainian lingua-information fund...

43
Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Ukrainian National Linguistic Corpus Linguistic Corpus and its and its application application

Upload: paul-eline

Post on 14-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Shyrokov Volodymyr, Bugakov OlegKrygin Maxim, Sydorchuk Nadiia

Ukrainian Lingua-Information Fund NASU

Ukrainian National Ukrainian National Linguistic CorpusLinguistic Corpusand its applicationand its application

Page 2: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

The main results of theoretical studies and an overview of practical implementations received in ULIF-NASU are presented in the collective monograph “Corpus linguistics”

Корпусна лінгвістика / Широков В.А., Бугаков О.В., Грязнухіна Т.О., Костишин О.М., Кригін М.Ю., Любченко Т.П., Рабулець О.Г., Сидоренко О.О., Сидорчук Н.М., Шевченко І.В., Шипнівська О.О., Якименко К.М. – К. – Довіра, 2005. – 471 с.

Page 3: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

UNLC statisticsUNLC statistics

General CorpusGeneral Corpus 48684868 storage objectsstorage objects;; 10131013 MB of the texts forMB of the texts for indexingindexing more than 62more than 62 mlnmln tokenstokens;;

Legislation CorpusLegislation Corpus 5757 5757 storage objectsstorage objects;; 151 151 MB of the texts forMB of the texts for indexingindexing more than more than 18 18 mlnmln tokenstokens;;

Page 4: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Technological principles for creating UNLCTechnological principles for creating UNLC Design and organization of the information architecture and

functionality of UNLC is performed on the systems engineering of the virtual lexicographic laboratories.

In accordance with the concept of virtual lexicographic laboratories, UNLC is designed using Service-Oriented Architecture (SOA) and Web-service technology.

The Internet infrastructure is used as a communication infrastructure. The following technology standards are used: XML for data

description; SOAP for exchange of the structured messages in the distributed systems; WSDL for service description; UDDІ for storing and providing the WSDL-descriptions on request.

Windows Communication Foundation (WCF) is used for interaction between different levels of UNLC. It is a service-oriented system for data and message exchange that provides to the software components an opportunity to interact locally or remotely via a simplified unified software model of the cross-platform interaction.

The necessary condition for bundled software functioning is the availability of high-powered means of security and data integrity.

Page 5: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

The general scheme of linguistic corpusThe general scheme of linguistic corpus

L_C

E_LIB E_LING

MC_BIndexG_OB_D

MDI

E_LIB E_LIB –– bibliographic subsystem bibliographic subsystem (electronic library)(electronic library);;

EE__LING LING –– linguistic subsystem linguistic subsystem;;MDIMDI – – subsystem for constructing subsystem for constructing

the multidimensional indexthe multidimensional index; ; IndexIndex – – multidimensional index multidimensional index

basebase. . This item represents the This item represents the database of results of MDI workdatabase of results of MDI work;;

MCMC__BB – – microcontext basemicrocontext base.. This This item is virtual and dynamically item is virtual and dynamically generated on user’s generated on user’s queryquery. . It It returns a set of microcontexts returns a set of microcontexts that match a search prescription that match a search prescription the user madethe user made..

Page 6: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Bibliographic subsystemBibliographic subsystemserves as a multipurpose information system that accumulates serves as a multipurpose information system that accumulates information of different kinds: serves as a tool to collect, store, information of different kinds: serves as a tool to collect, store, model and use the natural language information in the digital model and use the natural language information in the digital form.form.The generalized objects for storing in the bibliographic system The generalized objects for storing in the bibliographic system may be the objects in the electronic form in any data format. This may be the objects in the electronic form in any data format. This enables providing manuscripts, audio, video and other enables providing manuscripts, audio, video and other multimedia information besides usual printed texts to the library.multimedia information besides usual printed texts to the library.

Functions of the bibliographic subsystem forming a brief bibliographic description on the rules of forming a brief bibliographic description on the rules of

bibliographing based on the metadata elements of the storage bibliographing based on the metadata elements of the storage object recorded in the databaseobject recorded in the database;;

forming a detailed bibliographic description of the storage objectforming a detailed bibliographic description of the storage object;; editing the metadata set for a bibliographic description in editing the metadata set for a bibliographic description in

accordance with the changes made by a bibliographeraccordance with the changes made by a bibliographer analysis of changes in the bibliographic recordanalysis of changes in the bibliographic record;; work with the file system objectswork with the file system objects;; editing, inserting, deleting profiles, specifications, vocabularies editing, inserting, deleting profiles, specifications, vocabularies

and their elementsand their elements..

Page 7: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

The user selects a search box of the boxes included in the The user selects a search box of the boxes included in the search profile independently. If this is a text box, the user search profile independently. If this is a text box, the user enters information, if the box has a limited set of values, the enters information, if the box has a limited set of values, the user selects the search value from a dictionaryuser selects the search value from a dictionary. .

For the advanced search the combinations of logic operators For the advanced search the combinations of logic operators “and” and “or” are used“and” and “or” are used. .

Search results are presented as a list of bibliographic Search results are presented as a list of bibliographic descriptionsdescriptions..

The user can view a complete list of bibliographic parameters The user can view a complete list of bibliographic parameters for each object, view a resource (the full text) and record the for each object, view a resource (the full text) and record the search results into the filesearch results into the file..

Search by the bibliographic parametersSearch by the bibliographic parameters

Page 8: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application
Page 9: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application
Page 10: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Linguistic corpusLinguistic corpusprovides the full-text information processing and serves as a tool provides the full-text information processing and serves as a tool for retrieving the contexts on users’ search queries taking into for retrieving the contexts on users’ search queries taking into account certain linguistic parametersaccount certain linguistic parameters

Functions of the linguistic subsystem creating the creating the full-text index;full-text index; purifpurifyingying the the full-text index;full-text index; addaddinging the the indexing object;indexing object; indexing objectindexing objectss;; removing an indexed object from the full-text indexremoving an indexed object from the full-text index;; the full-text search of the words and phrases in all sources, or the full-text search of the words and phrases in all sources, or

sources selected by the bibliographic description, with the ability sources selected by the bibliographic description, with the ability to set the distance between the search wordsto set the distance between the search words;;

providing statistics;providing statistics; viewing the microcontextsviewing the microcontexts;; recording the microcontexts of the words and phrases into the recording the microcontexts of the words and phrases into the

filefile;; service functions of servicingservice functions of servicing..

Page 11: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

MarkingMarking the the structural elements structural elements Structuring by the text settings – “section”, “part”, “paragraph”,

“title”, “conclusions”, “summary”, “abstract”. Marking the paragraphs. Marking the words written in the letters of not Ukrainian alphabet. Structuring the text by the sentences pointing out the beginning and

end for each one. Marking the text words, the grammatical codes of which are defined

by special rules. This concerns: а) the words with a hyphen, the first part of which is an

abbreviation of the Ukrainian and Latin uppercase letters; б) abbreviations; в) the proper names unambiguously identified by the context

Marking the non-author text (quotes, direct speech). Identifying the text units that have no morphological status and are

not interpreted with the rules of morphological analyzer. Marking the words or text fragments written with interspacing. Marking places in the text that need to be edited later.

Page 12: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Search by the linguistic parametersSearch by the linguistic parameters

is realized due to the full-text indexis realized due to the full-text index..The user enters a search phrase, sets the desired maximum number of The user enters a search phrase, sets the desired maximum number of words between the search ones and selects additional full-text search words between the search ones and selects additional full-text search options, namelyoptions, namely:: search in a certain subset of objects;search in a certain subset of objects; use of theuse of the word order;; use of theuse of the distance between words;distance between words; use of theuse of the lemmatizationlemmatization;; use of theuse of the synonymysynonymy..

The result of the full-text search is a list of bibliographic descriptionsThe result of the full-text search is a list of bibliographic descriptions. . But unlike the bibliographic search, the user gets direct access to each But unlike the bibliographic search, the user gets direct access to each localization of the search item in the text, ie to all the contexts that localization of the search item in the text, ie to all the contexts that contain the search itemcontain the search item. . Choosing a source the user can view contexts Choosing a source the user can view contexts where the search item is highlighted in redwhere the search item is highlighted in red. . The size (length) of the The size (length) of the context can be changedcontext can be changed..

Page 13: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application
Page 14: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application
Page 15: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

For further processing all the contexts, or contexts of a certain source can be recorded into the html-file specifying the source context, the time of creation, and search phrases.

Page 16: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Applying UNLCApplying UNLC The source base of the linguistic information to create a The source base of the linguistic information to create a

fundamental academic lexicographic multivolume system fundamental academic lexicographic multivolume system “Ukrainian Language Dictionary”“Ukrainian Language Dictionary”;;

The database for linguistic research to identify new linguistic The database for linguistic research to identify new linguistic phenomena and formalize the existing onesphenomena and formalize the existing ones; ;

The system for grammatical markingThe system for grammatical marking;; Statistical analysis of the text dataStatistical analysis of the text data;; The environment of accumulation and processing of the The environment of accumulation and processing of the

information objects of different natureinformation objects of different nature;; The environment of interaction with the systems of grammar, The environment of interaction with the systems of grammar,

synonymic and explanatory dictionariessynonymic and explanatory dictionaries. . Creation of different linguistic and information systems (LIS) Creation of different linguistic and information systems (LIS)

by the corpus technologiesby the corpus technologies: : LIS LIS ““The Constitution of The Constitution of UkraineUkraine”; ”; LIS LIS ““T. G. Shevchenko Electronic EncyclopediaT. G. Shevchenko Electronic Encyclopedia””

Linguistic expertisesLinguistic expertises

Page 17: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

The explanatory “Ukrainian Language Dictionary”The explanatory “Ukrainian Language Dictionary”

Page 18: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Editing system of the dictionary entryEditing system of the dictionary entry

Page 19: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

LISLIS “ “The Constitution of UkraineThe Constitution of Ukraine””

1919

Page 20: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

T. G. Shevchenko Electronic EncyclopediaT. G. Shevchenko Electronic Encyclopedia

Page 21: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

LIS LIS ““HaidamaksHaidamaks””

Page 22: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Linguistic expertiseLinguistic expertise

The principle of applying statistical methods in the linguistic The principle of applying statistical methods in the linguistic expertise:expertise:

Text Text preliminary processingpreliminary processing statistical portrait statistical portrait parameters of comparison or analysis parameters of comparison or analysis analysis analysis result.result.

The program for research of the students’ works on plagiarism

forms a linguistic corpus of abstractsforms a linguistic corpus of abstracts compares any text with abstracts from the corpus by various compares any text with abstracts from the corpus by various

criteriacriteria creates and visualizes the result of comparisoncreates and visualizes the result of comparison

2222

Page 23: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

The window of the linguistic expertise programThe window of the linguistic expertise program

Page 24: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Selecting topics for comparisonSelecting topics for comparison

Page 25: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

The result of text analysisThe result of text analysis

When comparing the abstract text with the texts from the corpus of abstracts by one of the criteria, the two texts were found, which match the observable abstract on 63 and 53% respectively.

Page 26: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Visualization of the program work results Visualization of the program work results

Page 27: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Comparison of the texts of the 20-volumeComparison of the texts of the 20-volumeand 11-volume explanatory dictionariesand 11-volume explanatory dictionaries

Page 28: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

The analysis of the political parties’ platformsThe analysis of the political parties’ platforms

Page 29: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

The concordance statisticsThe concordance statistics

Page 30: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

The most frequent lexemesThe most frequent lexemesin the programs of parties (blocs)in the programs of parties (blocs)

Page 31: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application
Page 32: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Relative intensities of the key concepts in the Relative intensities of the key concepts in the election programs of the political parties in 2002election programs of the political parties in 2002

Page 33: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Disambiguation in the text using statistical methodsDisambiguation in the text using statistical methods

Lexical homonymyLexical homonymy КОСАКОСА

1. 1. Заплетене волоссяЗаплетене волосся2. 2. Сільськогосподарське знаряддя для косіння трави, збіжжя тощо, Сільськогосподарське знаряддя для косіння трави, збіжжя тощо, що що

має вигляд вузького зігнутого леза, прикріпленого до держакамає вигляд вузького зігнутого леза, прикріпленого до держака3. 3. Вузька намивна смуга суходолу в морі, річці тощо, сполучена одним Вузька намивна смуга суходолу в морі, річці тощо, сполучена одним

кінцем із берегомкінцем із берегом

Grammatical homonymyGrammatical homonymyПРАВПРАВ

1. 1. правоправо – – іменник іменник середнього родусереднього роду, родовий відмінок, однина, родовий відмінок, однина2. 2. правитиправити – дієслово доконаного виду, наказовий спосіб, друга особа, – дієслово доконаного виду, наказовий спосіб, друга особа,

однинаоднина3. 3. пратипрати – дієслово недоконаного виду, минулий час, чоловічий рід, – дієслово недоконаного виду, минулий час, чоловічий рід,

однинаоднина

Page 34: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

The scheme of the disambiguation algorithmThe scheme of the disambiguation algorithm

Manual marking of the initial training text T0: receiving marking М(T0)

Receiving statistics of the grammatical chains S0

Disambiguation by the statistical method in the training text Ti (receiving marking)

Control of the received marking by the specialist, corrective actions, additional marking М(Ti)

Disambiguation in the text of a certain genre

Combining statistics Si and Si-1

Receiving statistics Si

Page 35: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

;exp i

iTT

Ti={(w1)r1(w2)r2(w3)…(wN)}, where

wi – word forms, ri – word forms delimiters,

N – number of word forms in the text

M: T M(T)={(v1, g1) (v2, g2) (v3, g3)…(vN, gN)}, where

vi define the word form part of speech,

gi define the grammatical meaning,

Page 36: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

SS((TT)={([)={([vvii, , ggii] [] [vvii+1+1, , ggii+1+1]] [[vvii+2+2, , ggii+2+2]])), ,

pp([([vvii, , ggii] [] [vvii+1+1, , ggii+1+1]] [[vvii+2+2, , ggii+2+2])]),,

ii=1, 2, … =1, 2, … NN;; ii – the ordinal number of the word form in the text} – the ordinal number of the word form in the text}

([([vvii, , ggii] [] [vvii+1+1, , ggii+1+1]] [[vvii+2+2, , ggii+2+2]])) – a chain of grammatical – a chain of grammatical

meaningsmeanings

;)()( exp i

iTSTS

Page 37: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

DisambiguationDisambiguationMM´́: : TT MM´́((TT))

MM´́((TT)={()={(vv11, , gg11))´́ ( (vv22, , gg22))´́ … … ((vvNN, , ggNN))´́}, where}, where

MM´́((TT)) MM((TT):):

ni

i

i

mi

i

i

ii

v

v

v

gv

g

...

g

g

...),(

2

1

1

1

)(

22

12

12

12

22

22

12

12

12

11

21

11

11

11

11

11

11

11

11

11

11

11

2211exp

)g ,(...

)g ,(

)g ,(

)g ,(

)g ,(

)g ,(...

)g ,(

)g ,(

)g ,(

)g ,(

)g ,(...

)g ,(

)g ,(

)g ,(

)g ,(

),(),(),(:TS

ri

ei

ii

ii

ii

ii

qi

li

ii

ii

ii

ii

ni

mi

ii

ii

ii

ii

iiiiii

v

v

v

v

v

v

v

v

v

v

v

v

v

v

v

gvgvgvi

definednot

gvgvgv iiiiiiTS

_

),)(,)(,( 2211)( exp

Page 38: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

SoftwareSoftware

The grammatical marking program interface The grammatical marking program interface

Page 39: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

ReceivingReceiving SS((TTii))

In the first column there are triples ([vi, gi] [vi+1, gi+1] [vi+2, gi+2])

in the second column there is an information about punctuation within the chain

in the third column there is a chain position relative to the sentence beginning

in the forth column there is an absolute frequency of the triple([vi, gi] [vi+1, gi+1] [vi+2, gi+2]) in the text

Page 40: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Marking the unambiguous word formsMarking the unambiguous word formson the example of the Commercial Codeon the example of the Commercial Code

Page 41: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Disambiguation in the Commercial Code textDisambiguation in the Commercial Code text

Page 42: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Results of automatic disambiguationResults of automatic disambiguation

Text TStatistics

S(T)

Number of word formsHomonymous word forms

recognized Homonymous word forms

unrecognized

TotalHomony

mousUnambig

uousTotal wrong right

Constitution of

Ukraine 14131

10263 (72,63%)

3868 (27,37%)

Commercial Code

Constitution 51982 35161

(67,64%) 16821

(32,36%) 526

(1,78%) 321

(61%) 205

(39%) 34,635

(98,22%)

Family Code

Constitution +

Commercial Code

2352415819

(67,25%) 7705

(32,75%)

11273 (71%)

1853 (16,44%)

9420 (83,56%)

4546

(28,74%)

Penal code

Constitution+

Commercial Code + Family Code

59363 40806

(68,74%) 18557

(31,26%) 32744

(80,24%) 3015

(9,2%) 29729

(90,8%) 8062

(19,76%)

Page 43: Shyrokov Volodymyr, Bugakov Oleg Krygin Maxim, Sydorchuk Nadiia Ukrainian Lingua-Information Fund NASU Ukrainian National Linguistic Corpus and its application

Thank youThank you for attention for attention