ontology-based information retrieval of scientific information natalia v. loukachevitch laboratory...
TRANSCRIPT
Ontology-based information retrieval of scientific information
Natalia V. LoukachevitchLaboratory of Information Resources Analysis
Research Computing Center of Moscow State University (MGU NIVC)
Thematic Search of Scientific Information
• Knowledge-based (ontology-based) search
• Use of synonyms
• Automatic query expansion
• Automatic analysis of query results
• Help in interactive search
Sociopolitical domain
Le
ve
ls o
f h
iera
rch
y
LawAccounting
Taxes
Banking
Bilingual Sociopolitical Thesaurus The thesaurus development is based on three methodologies:
• methods of construction of information-retrieval thesauri (information-retrieval context, analysis of terminology, terminology-based concepts, a small set of relation types)
• development of wordnets for various languages (word-based concepts,
detailed sets of synonyms, description of ambiguous text expressions)
• ontology and formal ontology research (strictness of relations description, necessity of many-step inference)
(33,000 concepts, 80,000 Russian terms, 85,000 English terms)
General Lexicon
Specific Lexicon
Специальная лексика
Socio-Political Domain vs. General Lexicon and Specific
Lexicons
Intermediate Zone
Information
Security
Aviation Ontology
Cul
tura
l H
erita
ge
Ontology on Natural
Sciences
and Technology
30,000 concepts; 70,000
terms
Thematic Structure tax; taxation system; tax payer;
finances; economy; tax legislation; VAT
legislation; law; draft law;
Taxation Code;
deputy minister; Ministry of Finance;
finances; reform; tax reform
populationbudget, estimate;
finances; economy; document
government; state power; Minister of
Finance
State Duma; state power;
state
Thematic representation of a text:Thematic Node i||+ == Thematic Node j
Thematic node in the text
University Information System RUSSIA(http://www.cir.ru, http://uisrussia.msu.ru )
- Database of Fulltext Documents (1,5 mln): Legal Acts, Newspaper articles, Scientific Reports
- Database “Statistics of Russian Federation” (Socio-economic Statistics, Demographic Statistics, Agrarian Statistics, Urban Statistics)
- Database “Budget system of Russian Federation”) (www.budgetrf.ru)
Visualisation of Data in Dynamic Tables and Maps
Convertors Processing Interfaces Services
Unified Technology Platform (Constructor) Russian University Social Sciences
I nformation and Analytical consortiumwww.cir.ru
www.echr- base.ru
БД Статистика Россииwww.budgetrf.ru
Cross-Language Information
Retrieval
Applications of technology
• Concept-based information retrieval (monolingual, bilingual)
• Information-Retrieval systems combining word-based and concept-based serach
• Concept-based automatic text categorization
• Automatic Question-Answering
• Automatic Text Summarization
Main Projects
State Duma of RF (1999 - …) Central Election Commission of RF (1997 - …) Legal Company “Garant” (2002 – …) Ministry of Education (2005-2006) Accounting Chamber of RF (2003 – …) Central Bank of RF (2006 – …) Grants:
– McArthur Foundation (1994, 1995, 2004 - …)
– Ford Foundation (2002, 2003)
– Russian Foundation for Basic Research (9)
– Russian Foundation for Humanitarian (5)
– Eurasia Foundation (2002, 2003)
Participance in International Forums
• Participance in Text REtrieval Conference TREC
organized by NIST DARPA (TREC-6, TREC-8)
• Participance in Summarizarion Conference SUMMAC
organized by NIST DARPA (1st place)
• Cross-Language Evaluation Forum CLEF
(DELOS program)
– paricipance in Steering Committee
– provision of Russian collections for evaluation purposes
– information retrieval of domain-specific information
retrieval
• Organizers of Russian Information Retrieval Evaluation
Seminar ROMIP (www.romip.ru/en/ )