信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf ·...
TRANSCRIPT
![Page 2: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/2.jpg)
Content 1
• Information & Knowledge
2
• Challenges
3
• Achievements
![Page 3: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/3.jpg)
Information & Knowledge
![Page 4: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/4.jpg)
What do we use the internet for?
Access the Internet for getting news,
URL, etc.
INFORMATION SERVICE
Build
generating answers
Build knowledge graph for
generating answers
News Microblog
PictureVideo
Encyclopedia Forum
![Page 5: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/5.jpg)
Information Services• Baidu Box Computing• Google Knowledge Graph
![Page 6: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/6.jpg)
Knowledge• What is knowledge?
– Entity + Rule– Triple (predicate, semantic network, rule, framework, ……)
Abstracted Information
Coded InformationDrawback: Lack of ability to communicate with machine
DIRECTLY!
知识的作用:帮助信息计算、理解、评价
![Page 7: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/7.jpg)
Challenges
![Page 8: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/8.jpg)
Scientific Perspective• Our Goal
– Information Measurable – Knowledge Computable
Information/Knowledge
Acquisition
Measure Mining
Structurization
![Page 9: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/9.jpg)
Application PerspectiveInformation acquisition
Knowledge graph• How to build knowledge bases
– To build from original materials– To extend and update– To merge difference databases– To verify the knowledge
• How to use knowledge bases– For answer generation– For answer re‐ranking– For inference– For vertical search– ……
![Page 10: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/10.jpg)
Achievements
![Page 11: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/11.jpg)
Achievements (1)
• Information acquisition– Information metrics and its applications– ……
![Page 12: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/12.jpg)
Search with Key Words
2013/10/15 12
Inspect
![Page 13: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/13.jpg)
ComplexQA
VerticalSearch
EnterpriseSearch
ComputationalAdvertisement
Answer typing
Semantic tagging
Focus extraction
Concept extension
Semantic relatedness
Similarity metric
Question type classification
User interest modeling
Authority/Expert modeling
Emotion analysis
Opinion extraction
Opinion summarization
Sentiment classification
Content Understanding
User Understanding
SentimentUnderstanding
Analysis Layer
Semantic Layer
Application Layer
![Page 14: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/14.jpg)
Information Distance
Kolmogorov Complexity
Dmax(x,y)
Dmax(x,y|c)dmax(x,y|c)dmax(x,y)
Dmin(x,y)dmin(x,y)
Information Distance
Dmax(x1,x2,…)
14
![Page 15: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/15.jpg)
Main publications
• Answer re‐ranking– KDD 2007
• Concepts measure, relatedness evaluation– COLING 2010 BEST PAPER, IJCAI 2011
• Question answer pairs similarity measure– ACL 2012 BEST STUDENT PAPER
• Multiple document summarization– CIKM 2008, ICDM 2009
• 开放领域问答系统平台 –趣答
![Page 16: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/16.jpg)
Achievements (2)
• Knowledge base construction–Chinese Knowledge Base Construction– Information Organization for Multi‐source UGCs via Topic Hierarchy Construction
![Page 17: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/17.jpg)
Knowledge Base Construction
• Knowledge Extraction– Semi-structured text
• Tables, info-boxes, etc.– Free text
• Knowledge Transfer– Open knowledge bases
• Freebase, Yago, DBPedia, etc.– Inter-language transfer
• Wikipedia Multi-language linker• Google translation
Subject Relation Object
……
巴拉克奥巴马 拥有国籍 美利坚合众国
……
![Page 18: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/18.jpg)
Knowledge Base Construction
• Knowledge Base– Knowledge structure
• Triple– entity– relation
– Size of depository• Resources
– Baidu Baike– Freebase– Wikipedia
• Contains:– 250,000 entities – 1660,000 triples
Subject Relation object
……
巴拉克奥巴马 拥有国籍 美利坚合众国
……
巴拉克奥巴马 美利坚合众国拥有国籍
Extracted From Baidu Baike:120, 000 entities, 650,000 triples
Transfer From Freebase:130, 000 entities, 1000,000 triples
![Page 19: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/19.jpg)
Information Organization for Multi‐source UGCs via Topic Hierarchy Construction
• Multi‐source UGCs– Quality of Contents– Power of Statistics– Authority, timeliness, etc..
• Topic Hierarchy– root node: user topic– non‐root node:sub topics– leaf node: link to UGCs
Can we organize Multi‐source UGCs by a unified structure, which can effectively lead users to their required knowledge?
…
…
…policypoll islam
medic care
![Page 20: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/20.jpg)
Information Organization for Multi‐source UGCs via Topic Hierarchy Construction
• Topic extraction– Keyword extraction– Hyponym mining
• Sub‐topic relation
policy
thing
medic care
islam
debate
politifact
1 2( ( , )) ( ( ( , )), ( ( , )),...)A B A B A B direction relatednessp r t t F e r t t e r t t e e
( ( , )) ( , ))k direct
direction A B k k A Be E
e r t t w e t t
( ( , )) ( , )s undirect
relatedness A B s A Be E
e r t t e t t
directed‐evidences Source
~ Search engine
Wikipedia
Wikipedia
WordNet
undirected‐evidences Source
crawled UGCs
crawled UGCs
Wikipedia
5patterne0patterne
_wiki catee
_wiki titlee
_wiki pmie
_dis doce
_dis sene
wnete
barack obama
![Page 21: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/21.jpg)
Information Organization for Multi‐source UGCs via Topic Hierarchy Construction
• Topic Organization– Depth vs. relatedness– Real‐time update
• Topic Hierarchy Construction– Via iteration– Each iteration:
• Add a new topic t to the current hierarchy :
• Update the weights of nodes and edges on :
• Remove the potential cycles on the hierarchy :
1 1
argmax ( ( ( , )) ( ( , )))s i k i
k s s kt T T t T
t w r t t w r t t
( ) ( ( , )) ( ( , ))g G
t k root g k gt T
w t w r t t w r t t
| | 1
1 ends with 0
( ) max ( ) ( ( , ))s k
L
r s k t u u uL ut t
w t t w t w r t t
_ ( ')H Optimum Branching H
'H
H
Tax
p = 0.013
p = 0.009
p = 0.006
p = 0.015
Barack Obama
Tax
Debate
Resultant Hierarchy
Barack Obama
Policy
Tax
Debate
Policy
![Page 22: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/22.jpg)
Contribution
• Internet information organization by topic trees• Propose the algorithm of topic hierarchy construction,
out perform the state-of-art algorithms
• Public in SIGIR 2013
![Page 23: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/23.jpg)
Achievements (3)• Vertical Search Platform
– Unified Knowledge Representation: Triples (compatible with Freebase)
– General Information Processing Pipeline• High reusability of most modules• Easily and rapidly portable to different vertical domains if provided enough domain data
– Accommodating Heterogeneous Source of Data• Knowledge Base• CQA • FAQs• Query logs (scripts from the mobile company )• Encyclopedia (Wikipedia, Baidu Baike)• Free texts, books • APIs
![Page 24: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/24.jpg)
Vertical Search Platform
Query
七里香是谁唱的?
Subject Relation Object
……
七里香 歌手 周杰伦
……
Semantic Parser
PatternGeneration
KnowledgeBase
Answer
周杰伦
Structured Query
?x <perform> ?y?y <name>七里香
Dialog Management
Domain‐specific Grammar
PreconditionCompletion
![Page 25: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/25.jpg)
Applications– Public health care (Database + CQA)
公众健康问答
– Music search (Database) 音乐问答
– Mobile services (Query log) 业务助手
– Weather (CQA + APIs) 天气自动问答THU
– Open domain 清华小智
– Leaks & Exploits(Free text + Domain knowledge/rules)– College Recruitment(FAQ)
微信公共账号
![Page 26: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/26.jpg)
Examples of the Application (1)
• Music QA– Resource:Domain Database,CQA
![Page 27: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/27.jpg)
Examples of the Application (2)
• Health QA– Resource:Domain Database,CQA,Baidu Baike
![Page 28: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/28.jpg)
Main Works• Information distance metrics, Xian Zhang, Chong Long, Fan Bu, et al,
SIGKDD2007, ICDM2009, MI2009, COLING 2010(best paper), ……• Question classification, Fan Bu, et al, EMNLP2010• Question expansion, Zhicheng Zheng, et al, NAACL 2010• Concept relatedness evaluation, Fan Bu, et al, IJCAI2011• Passage retrieval based concept attribute extraction, Chao Han, et al,
CICLing2010• Question and answer pair mining, Shilin Ding, Fan Bu, et al, ACL2008, ACL
2012(best student paper)• Text summarization, Minlie Huang, et al, ACL2010, AAAI2012, CIKM
2008,ICDM 2009• Opinion mining, Fangtao Li, et al, AAAI2010, IJCAI2011, COLING2010• Information recommendation, Lijing Qin, Yang Tang, et al, JICAI 2013, SIGIR
2011 workshop on “entertain me” • Information extraction, Xingwei Zhu, et al, SIGIR 2013
![Page 29: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/29.jpg)
Main Publications
• Natural language processing– ACL 2008, 09, 10, 11, 12, ACL 2012 Best Student Paper Award, COLING 2010 Best Paper Award, EMNLP 2010, NAACL2010, ……
• Artificial Intelligence– AAAI 2010, 2012, IJCAI 2011, ……
• Data Mining– SIGKDD 2007, ICDM 2008, 09, 10, PAKDD 2007, WI 2009, CIKM 2006, 08, 12, ……
![Page 30: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/30.jpg)
Acknowledgement
• Prof. Ming Li and Minlie Huang, • Dr. Yu Hao, • Mr. Xian Zhang, Chong Long, Fan Bu, Hao Xiong, Chao Han, Zhicheng Zheng, Xingwei Zhu, Tanche Li, Yicheng Liu, Yipeng Jiang, and Yang Tang ……
![Page 31: 信息获取与知识图谱 - bj.bcebos.combj.bcebos.com/cips-upload/kg/zxy.pdf · 信息获取与知识图谱 清华大学计算机系 朱小燕 zxy‐dcs@tsinghua.edu.cn, @朱小燕THU](https://reader030.vdocuments.net/reader030/viewer/2022040108/5dd0bb6cd6be591ccb626cae/html5/thumbnails/31.jpg)
Thanks