entity queries

23
Entity Queries Seminar by Pankaj Vanwari Under guidance of Dr. S. Sudarshan

Upload: orsin

Post on 23-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Entity Queries. Seminar by Pankaj Vanwari Under guidance of Dr. S. Sudarshan. Overview of Presentation. Introduction to Entity Queries Keyword search on structured data Querying over unstructured data Entity queries using ontology based extraction Entity-relationship queries - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Entity Queries

Entity Queries

Seminar by Pankaj Vanwari Under guidance of Dr. S. Sudarshan

Page 2: Entity Queries

Overview of PresentationIntroduction to Entity QueriesKeyword search on structured

dataQuerying over unstructured dataEntity queries using ontology

based extractionEntity-relationship queriesConclusion and future work

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 3: Entity Queries

IntroductionQuery on database using

keyword searchRestricted to retrieving

pages/documentsEntity search on World Wide WebAnnotations and semantic links

to textWikipedia, Word-Net, etc… as

sourcesEntity near queries, indexing and

rankingEntity-relationship search to find

relationships between the entities

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 4: Entity Queries

Keyword search over graph structured dataSimple searching and browsing of

data.User types few keywords and then

follows the hyper-links interactively.Database is modeled as graph.Uses proximity based ranking, based

on foreign key and other similar links.Useful in searching enterprise

database for information without a query language.

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 5: Entity Queries

BANKS (Browsing ANd Keyword Searching)RDB tuples constitute nodes of the

graph. Each foreign key- primary key link is a

directed edge (to avoid “hubs”).Link with higher importance is given

lower weight.Query result is a rooted directed tree.Backward edge (v, u) with weight

based on the number of links to v from the nodes of same type as u.

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 6: Entity Queries

Formal database model of BANKSs(R(u), R(v)) denote the similarity

between two relations R(u) and R(v) of nodes u & v.

If edge(u, v) exists but (v, u) does not then weight w(u, v) = s(R(u), R(v))

If (u, v) does not exist and (v, u) does then w(u, v) = INv(u) * s(R(v), R(u))

If both exists then the weight is minimum of the above equations.

Overall relevance score is obtained from the normalized edge and node scores.

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 7: Entity Queries

Querying over unstructured data Worlds Wide Web supported

keyword searching but not entity search.

Entities as first class citizens as opposed to pages.

No schema information on web documents to browse as in BANKS.

Statistics from large corpus with scoring and ranking from IR can be useful.

Challenges: Indexing and Annotations.

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 8: Entity Queries

CSAWScaling Entity Search to world

wide webMajor components: Catalog,

Corpus and Query Processor.Data model of CSAWIndexes used in CSAW system:

The stem and full atype indexes, Reachability index and Forward index.

Scoring in CSAW: Selector energy, Gap and Decay and Aggregation.

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 9: Entity Queries

Entity Search with Dual-Inversion IndexDual inversion index : Document

inverted index and Entity inverted index.

Document inverted index: Given entity type E, maps to the documents where entity of type E occurs.

Entity Inverted Index: Entity instances as output from keywords as input.

Comparison of document and entity inverted indices.

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 10: Entity Queries

Entity Rank (Searching directly and holistically)Integrates both local and global

information in ranking.ow(amazon customer service

#phone)Entity search needs to be

contextual, holistic, uncertainty, associative, and discriminative.

Three layer model: Access (Global), Recognition (Local) and Validation (Hypothesis Testing).

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 11: Entity Queries

Entity queries using ontology based extraction Knowledge representation model

such as RDFS having general-purpose ontology on top of these representations.

Two ways of extracting knowledge structures automatically from text corpora: NLP/machine learning or human annotations.

YAGO, YAGO2 and ESTER all based on second approach with difference.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 12: Entity Queries

YAGO (Yet Another Great Ontology)YAGO combines Wikipedia categories

with the Word-Net ontology.Extracts facts based on fixed relations.Fact is a triple having fact identifier I. y : I (I U C U R)XRX(I U C U R)Compatable to RDF.Relations: Type, SubClassOf, Means, …Other relations: BornInYear,

PoliticianOf,…Meta relations: Describes, Context,…

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 13: Entity Queries

YAGO2 (extension of YAGO)Focus on temporal and spatial

knowledge.Declarative rules stored in text files.

Temporal dimensionFacts can only hold time points; time

spans are represented by two relations. 4 entity types (people, groups, artifacts

and events) 9 relations generalized to 2 relations

(StartsExistingOn and EndsExistingOn).

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 14: Entity Queries

YAGO2 continued…Spatial Dimension

Harvests geo-entities from two sources Wikipedia and GeoNames.

class yagoGeoEntity groups all geo-entities related by hasGeoCoordinates to yagoGeoCoordinates.

3 entity types (events, groups & artifacts).

2 relations generalized to placedIn.Relation occursIn holds fact and geo-

entity.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 15: Entity Queries

ESTER (Efficient Search on Text, Entities and Relations)Combined full-text and ontology search

system. Input is corpus and ontology.Three components: An entity recognizer, a

query engine, and a user interface.Entity recognition adds at position 0, the

artificial word < c >:< x > for each top-level category c of which x is an instance.For a fact (x; r; y) from YAGO add following artificial words: At pos1, add < r >:< p >, and at pos p, add entity :< y >.

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 16: Entity Queries

ESTER continued…Query engine produces lists of word-in-

document occurrences; each item consisting of a document-id, a word-id, a score, & a position within the document.

Two basic operations prefix search & join.Given two occurrence lists, produced by

prefix search, join operation computes a single list of all items whose word ids occur in both lists, and sorted by document id.

Proactive interface to user.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 17: Entity Queries

Entity relationship queries over annotated webExample query: “Find cities and

countries in Europe where cities are capitals of respective countries”.

ERQ to handle relationships among entities across several pages.

High algorithmic complexity.Scoring entities individually and

aggregating the scores.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 18: Entity Queries

WikiERQ: SSQ (Shallow Semantic Queries)ERQ directly over text Example query: “Find cities and

countries in Europe where cities are capitals of respective countries”.

Position based BCM for ranking answers. Key components proximity, ordering and mutual exclusion.

Single predicate scoringMultiple predicate scoring

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 19: Entity Queries

WikiBANKSExtended graph model combines

graph model of BANKS with document model.

Each Wikipedia page/document by a node in the graph.

Near query model: find C near (K)Query evaluation algorithm:

selection predicates individually as near query and then using entity lists to evaluate the relation predicates (2 approaches).Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 20: Entity Queries

WikiCSAWERQ over highly scalable CSAW

system.Queries in Master-Slave configurationCategory keyword mapping.Optimizing ERQ over CSAW:

Entity-Type and Keyword Pair Postings to improve merge step.

Compound Token-AND Iterator.Scoring based on Entity, Relation and

node prestige with weights. Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 21: Entity Queries

Conclusion and Future WorkChallenges faced by different approaches.Adding artificial words to link other pages

by enterprise (manually or defining rules).Integration of data by standards like RDF.Domain-centric concept search to handle

scalability. Ontology based mapping of user keywords to domains for higher accuracy.

Need for annotation of relations.Complex operations for adhoc queries.

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Page 22: Entity Queries

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Questions ?

Page 23: Entity Queries

Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

Thank You