roseann presentation
Post on 02-Jul-2015
126 Views
Preview:
DESCRIPTION
TRANSCRIPT
ROSeAnn
Aggregating Semantic Annotators
Luying Chen, Stefano Ortona, Giorgio Orsi, and Michael Benedikt<name.surname@cs.ox.ac.uk>
!Department of Computer Science
The University of Oxford!
DIADEM data extraction methodologydomain-centric intelligent automated
Plenty of data on the web
Plenty of data on the web
Plenty of data on the web
Plenty of data on the web
But the web is also textNews Feeds Posts, Tweets
- 41 -
155. Specific events and factors were of particular importance in the decline of ABCPs. Firstly, some conduits had large ABS holdings that experienced huge declines. When investors stopped rolling over ABCPs, these conduits had to rely on guarantees provided by banks which were too large for the banks providing them. While these banks received support to meet their obligations, investor confidence was nonetheless damaged. Secondly, structures in other ABCP markets around the world unsettled investors, including different guarantee agreements and single-seller extendible mortgage conduits. Thirdly, general concerns about the banking sector have caused investors to buy less bank related product.
Table 3 - European ABCP issuance
Q1 Q2 Q3 Q4 Total
2004 34.7 36.2 44.5 51.3 166.7
2005 58.1 63.4 61.6 55.2 238.4
2006 74.7 84.1 96.5 111.8 367.1
2007 148.8 142.3 156.7 186.1 633.9
2008 120.9 106 226.8
Source: Moody’s, Dealogic, ESF
Chart 5
Source: Société Générale Corporate & Investment Banking (market overview, 19 September, 2008)
Credit Derivatives Markets
156. The credit derivatives markets comprise a number of instruments. Credit default swaps represent, by far, the single most significant credit derivative instrument in terms of volume. Other credit derivative instruments are not covered in this consultation paper43.
43 Examples of credit derivatives not included in the scope of this consultation paper are total return swaps and credit linked notes.
PDFs
Entity recognition ecosystem
Understanding their behaviour
Collect all original entity types CompanyCountry
Movie…
PersonOrganization
Location…
CityCompany
StateOrCounty…
Organise them into an taxonomyCompany
Organization
StateOrCounty City
Location
Thing
Country
Movie
Person
Organization disjointWith Person
Organization disjointWith Location
Movie disjointWith Person
Person disjointWith Location
Add disjointness constraints
00.20.40.60.81
Person Date Movie
Precision Recall F-score
00.20.40.60.81
Location Sport Movie
Observation 1low accuracy
Entity extractors: observations
(*) Results obtained on Reuters http://about.reuters.com/researchandstandards/corpus/
Observation 2!Vocabulary is
limited and overlapping
Region'
Saplo'
Extrac1v'
AlchemyAPI'
Lupedia'
Zemanta'
Person' Country'
Scien1st'
Planet'
Museum'
Brand'Product'
Planet'
Ocean'
Company'
Entity extractors: observations
Analysis of conflicts
Conflicts are frequent -> reconciliation
Observation 3!They disagree on
concepts and spans
ROSeAnnReconcileOpinions ofSemanticAnnotators
Goals:Compute logically consistent annotations,Maximize the agreement among annotators.
Supervised: MEMMTrain a MEMM sequence labeller
Features (token-based):-entity type-subclass / disjointness-span (B/I/O encoding)
Inference:most likely and logically-consistent labelling for the sequence (Viterbi + pruning)
Unsupervised: Weighted Repair
Judgement aggregation:- experts give opinions about a set of (logical) statements- compute a logically-consistent, aggregated judgement
Database repairs / consistent query answering:- database instance + contraints (schema, dependencies)- answers computed on (minimal) repairs
Unsupervised: Weighted Repair
Propositions:- ontological constraints Σ- annotations (as facts)
Base support:
A annotates a span with C∈Σ
AtomicScore(C) =
+1, ∀Ai annotating S with C’⊑ C
-1, ∀Ai annotating S with C’⊓ C ⊑ ⊥
or failing to annotate S with C’!
(in Ai vocabulary) with C ⊑ C’{
Unsupervised: Weighted Repair
Initial solution: conjunction of all types:
φ: C1 ∧ C2 ∧ … ∧ Cn
Repair operations (op):
- ins(Ci): insertion of a Ci not already in φ
- del(Ci): deletion of a Ci from φ (and all its subclasses) and ins(¬Ci)
Solution (S): - non conflicting: op in S do not “override” each other - non redundant: insertion/deletion of not implied types- consistent with Σ - maximally agreed:max( 𝚺ins(C)∈S AtomicScore(C) - 𝚺del(C)∈S AtomicScore(C) )
Weighted Repair: Example
Person ⊓ Organisation ⊑ ⊥!
Chef ⊑ Person!
Person ⊑ LegalEntity!
Organisation ⊑ LegalEntity
Σ:
AtomicScore(Person) = +2 {A1,A3} -1 {A2} = +1
AtomicScore(Organisation) = +1 {A2} -2 {A1,A3} = -1
AtomicScore(Chef) = +1 {A3} -1 {A2} = 0
AtomicScore(LegalEntity) = +3 {A1,A2,A3} = +3
φ: Person ∧ Organisation ∧ Chef
text text text, Jamie Oliver and some text here
A2A1 A3Person Organisation Chef
Weighted Repair: Example
φ1: LegalEntity ∧
¬Person ∧
¬Organisation ∧
¬Chef
Agr(S1) = +3-1+1-0 = +3
ins(LegalEntity)
S1
del(Chef)
del(Person)
del(Organisation)
ins(LegalEntity)
S2
del(Organisation)
φ1: LegalEntity ∧
Person ∧
¬Organisation ∧
Chef
ins(LegalEntity)
S2
del(Organisation)
φ1: LegalEntity ∧
Person ∧
¬Organisation ∧
ChefAgr(S1) = +3+1+1+0 = +5
φ1: Chefφ1: Chef
φ: Person ∧ Organisation ∧ Chef
Weighted Repair: Breaking ties
ins(LegalEntity)
S2
del(Organisation)
φ1: LegalEntity ∧
Person ∧
¬Organisation ∧
Chef
φ1: Chef φ1: Person
φ1: LegalEntity ∧
Person ∧
¬Organisation ∧
¬Chef
ins(LegalEntity)S3
del(Organisation)del(Chef)
same agreement fewer operations
φ: Person ∧ Organisation ∧ Chef
Entity extractors: evaluation
Corpora:- MUC7 NER task [300 docs, 7 types, ~18k entities]- Reuters (sample) [250 docs, 215 types, ~50k entities]- FOX (Leipzig) [100 docs, 3 types, 395 entities]- Web [20 docs, 5 types, 624 entities]
Evaluation:
PrecisionΩ=|InstAN(C+) ∩ InstGS(C+)|
|InstAN(C+)|
RecallΩ=|InstAN(C+) ∩ InstGS(C+)|
|InstGS(C+)|
micro and macroaverages
10-fold validation
Evaluation: Individual vs Aggregated
(*) Full comparison results at http://diadem.cs.ox.ac.uk/roseann
Evaluation: Aggregators
(*) Full comparison results at http://diadem.cs.ox.ac.uk/roseann
Evaluation: Performance
WR:∝ number of annotations insisting on a span
MEMM: ∝ number of concepts in the ontology
WR MEMM
Performance: MEMM Training
MEMM: ∝ number of entity types in the ontology
ROSeAnn @work
Text
ROSeAnn @work
Web
ROSeAnn @work
Summary
Not discussed- Resolution of conflicting spans- Relationships with consistent QA / argumentation frameworks- WR with weights / bootstrapping- Web and PDF structural NERs (SNER)- MEMM vs CRF!
Future- Automatic maintenance of the ontology- Probabilistic and ontological querying of annotations- Relation, attribute, sentiment extraction- Entity disambiguation and linking
Get ROSeAnn at: http://diadem.cs.ox.ac.uk/roseann
Try out our REST endpoints:http://163.1.88.61:9091/roseann/texthttp://163.1.88.61:9091/roseann/web
top related