relation detection and recognition

23
Relation Detection And Recognit ion *** *** ***

Upload: quinn-valenzuela

Post on 01-Jan-2016

59 views

Category:

Documents


2 download

DESCRIPTION

Relation Detection And Recognition. *** *** ***. Schema. General Description Name Entity Recognition RDR Training Corpus Generate Relation Detection and Recognition Performance Analysis. General Description-Algorithm. EDR: CRF Character based RDR: SVM Pos is needed. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Relation Detection And Recognition

Relation Detection And Recognition

*** *** ***

Page 2: Relation Detection And Recognition

Schema

• General Description• Name Entity Recognition• RDR Training Corpus Generate• Relation Detection and Recognition• Performance Analysis

Page 3: Relation Detection And Recognition

General Description-Algorithm

EDR:CRF

Character based

RDR:SVM

Pos is needed

Page 4: Relation Detection And Recognition

General Description-Workflow

Page 5: Relation Detection And Recognition

Schema

• General Description• Name Entity Recognition• RDR Training Corpus Generate• Relation Detection and Recognition• Performance Analysis

Page 6: Relation Detection And Recognition

Name Entity Recognition-Algorithm

• CRF++• Character based• Most naive

北 nb京 ne1 non月 non1 non日 non讯 non中 nb华 nm全 nm国 nm总 nm工 nm会 ne今 non日 non发 non

Page 7: Relation Detection And Recognition

Name Entity Recognition-Accuracy

• nr precious:100% right:88 error:0• nt precious:100% right:36 error:0• ns precious:100% right:56 error:0• 180/181

• 海湾战争 nz 9 22

Page 8: Relation Detection And Recognition

Schema

• General Description• Name Entity Recognition• RDR Training Corpus Generate• Relation Detection and Recognition• Performance Analysis

Page 9: Relation Detection And Recognition

RDR Training Corpus Generate

• The vector SVM need:e1.type, e2.type,order, dist,

w-2,w-1,w0,w1,w2,t-2,t-1,t0,t1,t2,

w-2,w-1,w0,w1,w2,t-2,t-1,t0,t1,t2,

relation

Exp:国家环保局局长解振华庄重宣布

国家环保局 ,2, 解振华 ,1,3,11,NULL,NULL, 国家环保局 , 局 , 长 , 局 , 长 , 解振华 , 庄 , 重 ,null,null,null,NN,NR,NN,NN,null,VA,DEC,E

Page 10: Relation Detection And Recognition

RDR Training Corpus Generate

1 、 NLP Pos tag:

国家 /NN 环保局 /NN 局长 /NN 解振华 /NR 庄重 /VA 宣布: /DEC

2 、 Compare with Entity:

国家环保局 /nt ,解振华 /nr

3 、 Find the type front and backnull,null,null,NN,NR

NN,NN,null,VA,DEC

Page 11: Relation Detection And Recognition

RDR Training Corpus Generate

• 4 、 Tag the train corpus by hands

国家环保局 ,2, 解振华 ,1,3,11,NULL,NULL, 国家环保局 , 局 , 长 , 局 , 长 , 解振华 , 庄 , 重 ,null,null,null,NN,NR,NN,NN,null,VA,DEC,E

Page 12: Relation Detection And Recognition

RDR Training Corpus Generate

• Use Assit Program:

Tagged Corpus:

602 sentence3000+relations

Page 13: Relation Detection And Recognition

Schema

• General Description• Name Entity Recognition• RDR Training Corpus Generate• Relation Detection and Recognition• Performance Analysis

Page 14: Relation Detection And Recognition

概述

• 将关系识别问题看作多分类问题 输入:实体对向量集 X(x1,x2,……xn)

其中 xi (f1, f2, ……fn ) 表示实体对 (E1,E2)

输出: xi 所属的类型 yi

• 使用 SVM 的方法构造分类器 选取合适的特征集来描述实体对,并映射 到高维实数空间,进行分类

Page 15: Relation Detection And Recognition

SVM

• 支持向量机 ( Support Vector Machine, SVM)

其主要思想是针对两类分类问题 , 在高维空间中寻找一个超平面作为两类的分割 , 以保证最小的分类错误率。通过学习 , 可以自动寻找那些对分类有较好区分能力的支持向量 , 由此构造出的分类器可以最大化类之间的间隔。

• 工具 LIBSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/)

有可执行的程序来构造多分类器以及训练和预测功能。

Page 16: Relation Detection And Recognition
Page 17: Relation Detection And Recognition

实体对过滤模块

• 关系定义 C:chief(nr-nt) E:employee(nr-nt) L:located in(nt-ns) N:no relation

• 实体对过滤模块 将除 (nr-nt),(nt-ns) 外的

实体对过滤,过滤后的实体对作为 candidate 进行标注 (train) 或分类 (test)

Page 18: Relation Detection And Recognition

特征选取和向量化模块

• 选取以下特征构造特征集e1.type,e2.type,contain,orde

r,dist,

w-2,w-1,w1,w2,t-2,t-1,t1,t2,

w-2,w-1,w1,w2,t-2,t-1,t1,t2,

Relation• 在实际模型训练中有调整• 映射到向量形式

Page 19: Relation Detection And Recognition

向量化模块和 scale 模块

向量形式 Label index1:value1 ……

1 1:2 2:3 3:4……

Scale ( libsvm: svm-scale.exe ) 对数据集进行缩放 ([-1,1])

便于计算,统一训练集和测试集

Page 20: Relation Detection And Recognition

训练模块

• 人工对 candidate 进行关系标注• Libsvm: svm-train.exe• 特征集和参数的选择(交叉验证法)• 构造模型

Page 21: Relation Detection And Recognition

测试

• SampleTestData• P=76%

Page 22: Relation Detection And Recognition

Schema

• General Description• Name Entity Recognition• RDR Training Corpus Generate• Relation Detection and Recognition• Performance Analysis

Page 23: Relation Detection And Recognition

分析与改进

• 前序工作引入的误差• 训练语料不够大• 人工标注的语料引入误差• 特征集的选取(提取语义特征)• 训练参数的选取(网络搜索)