an intelligent retrieval system for chinese agricultural scientific literature

Post on 14-Apr-2017

625 Views

Category:

Education

10 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Intelligent Retrieval System for An Intelligent Retrieval System for Chinese Agricultural Scientific Chinese Agricultural Scientific

LiteratureLiterature

Ping Qian, Xiaolu SuPing Qian, Xiaolu SuScientech Documentation and Information CenterScientech Documentation and Information Center ,,

Chinese Academy of Agricultural Sciences, China.Chinese Academy of Agricultural Sciences, China.

{pingq, suxiaolu}@mail.caas.net.cn{pingq, suxiaolu}@mail.caas.net.cn

IntroductionIntroduction• How to find out desired information from huge

information resources faster and accurately, has become the serious harassment for people to develop and utilize the network information resources.

• This project attends to use new theory and technology to explore a solution to above problem.

• Currently, knowledge engineering concerning ontology under research is an important theoretical foundation and applied technology to solve knowledge discovery and acquisition.

Information Retrieval Information Retrieval Based on OntologyBased on Ontology

• Build up the domain ontology

• Create the database, referring to the ontology

• Conduct the retrieval with the help of ontology

• Process the results, then display the results

• Import the classification method based on ontology theory

• Create agricultural navigation information database

• Create index database ( Agricultural Scientific literature database )

• Create Web information retrieval system

• Display the results

Establish Process of Establish Process of the Systemthe System

Foundation of Building Agricultural SciFoundation of Building Agricultural Scientech Navigation Information Databaseentech Navigation Information Database

• Theory: Ontology• Data Source: Agricultural Scientech Literature

Database (more than 560,000 records)• Tool: Statistical Analysis• Standard: Chinese Library Classification Me

thod

Stages of Building Agricultural Stages of Building Agricultural Navigation Information DatabaseNavigation Information Database

1.1. Agricultural Agricultural TheoreticalTheoretical Classification Tree Classification Tree 2.2. Agricultural Agricultural ActualActual Classification Tree Classification Tree3.3. ClassClass -- Keyword Cross Table Keyword Cross Table 4.4. KeywordKeyword -- Class Cross Table Class Cross Table 5.5. Agricultural Navigation Information DatabasAgricultural Navigation Information Databas

ee

Agricultural Agricultural Theoretical Theoretical Classification TreeClassification Tree

– Component• All of the Classes relevant to Chinese Library

Classification Method– Purpose

• Solve the problems in creating actual classification tree:– The relation between class number and its name– The gradation relation of some class numbers

– Data Amount• Class and subclass: 42,948• First Layer Class:17

序号 类号 类名 记录数 1 S 农业、农业科学 470,213

2 F 经济 47,503

3 T 工业技术 23,555

4 Q 生物科学 10,440

5 X 环境科学、劳动保护科学(安全科学) 6,252

6 P 天文学、地球科学 1,109

7 G 文化、科学、教育、体育 1,106

8 O 数理科学和化学 433

9 U 交通运输 398

10 R 医药、卫生 391

11 C 社会科学总论 209

12 D 政治、法律 102

13 Z 综合性图书 22

14 N 自然科学总论 21

15 K 历史、地理 19

16 H 语言、文字 5

17 V 航空、航天 2

First-Order Class Name in the First-Order Class Name in the TheoreticalTheoretical Tree Tree

Agricultural ActualActual Classification Tree

– Component :• All of the classes indexed actually

– Purpose :• Founding the navigation information database• Knowing the actual distribution of agricultural informatio

n to find new growing points of the development of agricultural sciences

– Data amount:• Classes: 21,391 , Among them.

• Coordinated classes: 10,748• Non-Coordinated classes: 10,643

Agricultural ActualActual Classification Tree Key PointKey Point ::

More than 100,000 class number and its corresponMore than 100,000 class number and its corresponding class nameding class name

Solution:Solution:Create Professional modeled class tables Create Professional modeled class tables (( 99 ))Create modeled class tables (6), among them:Create modeled class tables (6), among them:

General modeled class tables General modeled class tables (( 22 ))Professional modeled class tables Professional modeled class tables (( 44 ))

Modeled Class Table

表名 仿分范围 仿分范围名称 仿分类号 f401_406 F407.1/.9 各工业部门经济 F401/406 s220 S221/229 各种农机具 S220 s50 S51/59 各种农作物 S50 s60 S63/68 各种园艺 S60 S763_30 S763.31/.49 各种虫害及其防治 S763.30 s821 S822/829.9 各种家畜 S821 s831 S823/839 各种家禽 S831 s881_884_9

S885.1/.9 其他各种蚕类 S881/884. 9

s965 S943 各种鱼类的病害、敌害及其防治 S965

General Compound Class Table表名 仿分范围名称 记录数 字段数

fb2 世界地区复分表 F401/406 5 fb3 中国地区复分表 S220 4

Professional Compound Class Table

表名 复分范围 复分范围类名 记录数 F33_37 F33/37 各国农业经济 21 F43_47 F43/47 各国工业经济 19 S727_728 S727/728 各林种、各类特殊地区的造林 5 S79 S791/796 各种森林树种 8

Examples of Modeled Class Table

Examples of General Modeled Class Table

Examples of Professional Modeled Class Table

Class - Keyword Cross Table (17,582)

Keyword - Class Cross TableBeforeBefore delete replication delete replication about 1,210,000 wordsabout 1,210,000 words

After delete replication After delete replication About 320,000 wordsAbout 320,000 words

Agricultural Navigation Information Database

• Determine the regulations for organizing the information

• Make XML files for navigation information• Choose the database management system• Define database structure

The Regulations for Organizing the Information

• Never lose any class or sub-class having record • Display order: Class having more records listed first,

then listed from higher class layer to lower• If one node does not have record as well as one sub-

node only, this node is deleted and move its sub-node to upper layer

• Sub-class below the third layer class merge up to the third class

• Less than 30 records in the subclass are ignored temporarily

XML files for Navigation Information(33MB)

Data Check and Display Menu

Database Management System

• Relational Database– XML - Enabled Database

• Need transfer, low efficiency

• Native XML Database – Software AG Tamino

• Read XML data directly• Save data in XML format

Define Database Structure

System FrameworkXMLDBMS/RDBMS+XML+JAVA/JSP Browser / Server 3 Layer system structure

Environment for running JSP and XML

Java SDK 1.3.1 Xalan2.2.0

Tomcat3.2

Demo of The Retrieval System

Registration

Login

Browse Retrieval

Enter Keyword

Display the Results

Second-Order Retrieval

Retrieval from the Tree Directly

Retrieval from the Tree Directly

Intelligent Retrieval

Fined Retrieval

Fined Retrieval

Conclusion• The establish of the agricultural scientific navigation informati

on database and the development of its web search system change the traditional retrieval method from based on keyword to based on knowledge organization structure.

• It is also a foundation work. The actual classification table and the cross tables between class and keyword established in the project are valuable Chinese agricultural semantic resources.

• It is useful for the further studies on the automatic distinguish and classification of agricultural information as well as constructing strict agriculture domain ontology.

• The work is just the beginning of the study on ontology and its application in agriculture.

The EndThe End

Thanks for All

top related