invention information retrieval and visualization€¦ · mongodb django web framework •...

12
Invention Information Retrieval and Visualization Contents: 1. Introduction 2. Background 3. IR Framework 4. Visualization Framework 5. Conclusion Honggu Lin(u6135394)

Upload: others

Post on 01-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

Invention Information Retrieval and Visualization

Contents:1. Introduction2. Background3. IR Framework4. Visualization Framework5. Conclusion

Honggu Lin(u6135394)

Page 2: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

User:• Person

Query:• Keywords• short

Goal of search:• Precision-Oriented• Few top relevant

document are sufficient

User:• Patent

analysts

Query:• Patent document• Long

Goal of search:• Recall-Oriented• Top 100-200

documents are examined

Web Search Prior Art Search

1. Introduction

Figure 1: Comparison between Web Search and Prior Art Search

Page 3: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

2.Background2.1Structure of Patent

Figure 2 .1: A sample XML file for a patent document from the EPO[1]

• Title• Abstract• Description• Claims• International Patent

Classification Code (IPCR)

• Citations

Page 4: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

2.Background2.2 Elasticsearch

• A search engine based on Lucene.

• Open source.

• Neal-time search.

• HTTP web interface and schema-free JSON documents.

• Elasticsearch is developed alongside a data-collection and log-parsing engine called Logstash,

and an analytics and visualisation platform called Kibana.

Page 5: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

3. IR Framework3.1 Patent Retrieval Overall Process

Query Patents

Query

Patents in Collection

Indexed Documents

Retrieved Documents

Query (Re)formulation Indexing

Retrieval Model(Elasticsearch)

Feedback

Figure 3.1: Illustration of the process in my patent retrieval system

Patent Preprocess Indexing Patent Preprocess

Index statistic(TF-IDF)

Page 6: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

3. IR Framework3.2 Data Collection

• Cross Language Evaluation Forum for Intellectual Property evaluation track (CLEF-IP).

• CLEF-IP 2010 contains 2.6 million patent documents, 2000 topics

68%

24%

8%

Language

EN

DE

FR

Figure 3 .1: Percentage of English, German, and French patents in CLEF-IP 2010 collection

22%

10%

16%

52%

Completeness

Title

Title+Abstract

Title+Claims+[Abstract]

Title+Description+Claims+[Abstract]

Figure 3 .2: Completeness of the presence of English text in the CLEF-IP 2010 patent collection

Page 7: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

3. IR Framework3.3 Data Preprocess

.XML .JSON Format UnifySection SelectionLanguage Filter

Index the .JSON file in Elasticsearch

Figure 3.3 Illustration of the process of Date Preprocess

Page 8: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

3. IR Framework3.4 Query Reduction

Section selection

Term extraction(TF-IDF)

technical phrase formation

Metadata usage (IPCR)

Section Combination

Figure 3.4 Process of Query Formation

Page 9: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

4. Visualization FrameworkQuery and Related Patents Selected from the Results

MongoDB

Django Web Framework• Highlight Common Area

between query and its related patent.

• Common Word Word-Net

Put in

Use

Effects

Figure 4.1:Illustration of the process in a Query and Related Patent Visualization System

Page 10: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

5. Conclusion

• Explore the differences of results when we use different query formulation method and find out he optimal one.

• Visualize the retrieval result in a more intuitive way.

Page 11: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

Reference:[1]Walid Magdy . (2012). Toward Higher Effectiveness for Recall- Oriented Information Retrieval: A Patent Retrieval Case Study . Retrieved from http://doras.dcu.ie/16814/1/WalidMagdyThesis.pdf

Page 12: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put

Q & A