search computing engineering seco: liquid queries marco brambilla, stefano ceri seco workshop, como,...
TRANSCRIPT
Search Computing
Engineering SeCo: Liquid Queries Marco Brambilla, Stefano Ceri
SeCo workshop, Como, June 17th-19th, 2009
Brambilla, CeriSearch Computing: LIQUID QUERIES
Agenda
1. Overview of the SeCo architecture– Development and experimentation roadmap
2. Application development approach: LIQUID QUERIES – Configurability of the interface, strong parameter typing, static mapping to
services– Continuous query processes– Exploitation of user intelligence (interactive query process – user feedback), BPM– Automatic code generation of user interface and interaction steps– Adaptivity and customization of the query interaction
3. Support to the developer in the various design phases– Service marts specification– Query specification– User interface specification
4. SeCo extensions– High-level queries -- General almost-NL query
• NLP, wordnet, query splitting, and mapping to services
2
1. Overview and roadmap of the SeCo architecture
SeCo workshop, Como, June 17th-19th, 2009
Brambilla, CeriSearch Computing: LIQUID QUERIES
Search Computing architecture: overall view 4
Main Query flow
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queries Merged Results
DomainFramework
Cache
Final UserResults
<Uses> relation
High level query“Where can I attend a DB
scientific conference close to a beautiful beach reachable
with cheap flights?”Sub query 1“Where can I attend a DB scientific conference?”
Sub query 2“place close to
a beautiful beach?”
Sub query 3“place reachable with
cheap flight?”
Low level query 1ConfSearch(“DB”,placeX,dateY)Low level query 2
TourSearch(“Beach”,PlaceX)Low level query 3Flight(“cost<200”,PlaceX,DateY)
Query plan
Services invocations and operators execution
Results
Presented resultsMSVVEIS’08 - Barcelona – IberiaLID’08 – Rome - AlitaliaRCIS’08- Marrakech- AirFrance
Brambilla, CeriSearch Computing: LIQUID QUERIES
Search Computing architecture: configurability of the implementation 5
Main Query flow
<Uses> relation
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queries Merged Results
DomainFramework
Cache
Final UserResults
Ad
min
In
terf
ace
Lo
w-le
vel q
ue
rie
s
Su
b-q
ue
rie
s
Co
ncr
ete
Qu
ery
Pla
n
Brambilla, CeriSearch Computing: LIQUID QUERIES
Search Computing architecture: development roadmap6
Prototype 1:Core behaviour of the system.
• Engine-based execution of queries • Domain repository• Service repository • Coarse result presentation
<Uses> relation
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queriesMerged Results
DomainFramework
Cache
Final UserResults
Ad
min
In
terf
ace
Lo
w-l
eve
l qu
erie
s
Su
b-q
ue
rie
s
Co
ncr
ete
Qu
ery
Pla
n
Prototype 2:Planning
• Automatic optimized query planning
Prototype 3:Mapping and presentation
• mapping to domains• presentation of results
Prototype 4:High level queries
2. Application development approach:
LIQUID QUERIES
SeCo workshop, Como, June 17th-19th, 2009
Brambilla, CeriSearch Computing: LIQUID QUERIES
LIQUID QUERY
A level above the optimization:– Forcing the query flow
LIQUID QUERY: A query with flexible boundaries
Control is – on the user – at query time– on the evolution
Contextual/recommended direction could be proposed
In line with current trends in search (and others!)
8
Forward-looking and (a little bit) far-fetched ideas
Open to discussion
Brambilla, CeriSearch Computing: LIQUID QUERIES
Microsoft Bing Contextual step-by-step evolution of the query
9
Brambilla, CeriSearch Computing: LIQUID QUERIES
Google Squared
Multi-content, resizable, reshapeable query
10
Brambilla, CeriSearch Computing: LIQUID QUERIES
Not a search: Hunch
Just a big decision tree
Perceived as great value by today users
11
Brambilla, CeriSearch Computing: LIQUID QUERIES
Yahoo! research
Web of pages vs. Web of objects
Understand the need behind the user query
Exploiting user intelligence– Tags– Folksonomies
Multi-step queries
Multi-technology queries– Annotations– Content-based
12
Brambilla, CeriSearch Computing: LIQUID QUERIES
LIQUID QUERY
Moving from "one time query" to a process-based approach
Continuation of queries based on exploitation of relations between service marts
A query with flexible boundaries, that can be– Reshaped/refined: asking for different information on the results– Expanded: asking for additional information on the results
adding new domains– Extended: asking for more results
by the user at runtime
Contextual/recommended direction could be proposed
Relies on the SeCo query machine– Every user interaction could trigger recalculations
13
Brambilla, CeriSearch Computing: LIQUID QUERIES
Liquid query navigation
Liquid- what?– Liquid data (BEA & Co., @ San Diego)– Liquid publications and docs (Fabio & Co., @ Trento)– (old-style) Liquid queries (Heer & Co., @ Berkeley)
Somehow similar to Google Squared, but: – Multi-domain– Multi-purpose– More flexible
14
Upon first query cycle, various options to the user: Refinement of the query Extension of the query results (give me more) Expansion of the query (add more domains) Choosing a different connection between services (i.e.,
changing the adopted access pattern) Clustering, re-ranking, ...What does it imply at the query machine level?
Brambilla, CeriSearch Computing: LIQUID QUERIES
Liquid query navigation15
Conference Photo Description Date Hotel Photo Description Address Services
Brambilla, CeriSearch Computing: LIQUID QUERIES
Liquid query: clustering/unclustering 16
At the query machine level?Probably nothing, just a presentation issue, if...
Brambilla, CeriSearch Computing: LIQUID QUERIES
Liquid query: ranking/reranking
For unclustered data or cluster representatives
17
At the query machine level?If multi-ranking service available, recompute the query.If not, just re-sort the query result at presentation level.
Brambilla, CeriSearch Computing: LIQUID QUERIES
Liquid query: refinment
Adding additional constraints– E.g., on this timeframe
... More or less results...
18
Search againRefined search...
At the query machine level?Rebuild the plan, possibly. And re-execute the query.If pieces can be reused... (caching)
Brambilla, CeriSearch Computing: LIQUID QUERIES
Liquid query: extend the query 19
At the query machine level?Run again the machine on further data
Gimme more
Asking for more results
... More results...
Brambilla, CeriSearch Computing: LIQUID QUERIES
Asking for more results on a specific service
... More results...
Liquid query: zooming in (service-wise) 20
At the query machine level?Run again the machine on that service. Or:Change the throughput of the machine Clock branches
Gimme more
Brambilla, CeriSearch Computing: LIQUID QUERIES
Asking for more results on a specific item
... More results...
Liquid query: zooming in 21
At the query machine level?Run again the machine on services joined to that item. Or:Change the throughput of the machine Clock branches
Gimme more
Brambilla, CeriSearch Computing: LIQUID QUERIES
Liquid query: expand (shrink) the query 22
At the query machine level?Changing the plan.If something can be reused ... (caching)
Additional subquery
Asking for more columns (or remove existing ones)
... Results... ?
Brambilla, CeriSearch Computing: LIQUID QUERIES
Changing the used access paths
Liquid query: change join conditions 23
At the query machine level?Changing the plan.If something can be reused ... (caching)
Brambilla, CeriSearch Computing: LIQUID QUERIES
Horizontal and vertical multi-domain search
Structure of the interface automatically generated based on the structure of the access plan
Additional feature: save the resulting inteface (for query and results) for canned vertical applications
Apply a stylesheet for making the application real– Set of default stylesheets that can be painted upon the inteface– Possibility of defining custom stylesheets
24
3. Support to the designer.
SETTING UP THE LIQUID QUERY ENVIRONMENT
SeCo workshop, Como, June 17th-19th, 2009
Brambilla, CeriSearch Computing: LIQUID QUERIES
Registration Time
The role of the designer is at registration time!
Low development cost
Higher cost of registration– Description of services– Description of default interfaces for services inputs and results
26
Brambilla, CeriSearch Computing: LIQUID QUERIES
The tools
Strong parameter typing– UI fields are typed
Static mapping to services– UI fields are directly mapped to search services
BPM-like modeling of the user interaction and query processing steps
Automatic generation of UI
Adaptivity and customization of the query interaction
27
Brambilla, CeriSearch Computing: LIQUID QUERIES
The hard task: Registration time
Building access patterns
Building binding
Defining the (lightweight) semantics– Domains– Keywords
Defining the (default) presentation– Forms– Results
28
4. SeCo Extensions:
High level queries
SeCo workshop, Como, June 17th-19th, 2009
Brambilla, CeriSearch Computing: LIQUID QUERIES
High level queries
Almost NL-specified queries– Conjunctive noun phrases
Need to be decomposed and mapped to semantic domains need of domain repository
Require NLP and “semantization” of phrase contents need of NL analysis
Brambilla, CeriSearch Computing: LIQUID QUERIES
Domain repository
Storage of – domain definitions taxonomy (e.g., Dewey classification)– mappings of NL words to domains– mappings of services to domains
Shallow approach based on– Wordnet (synsets Sx)– Wordnet-Domains (domains Dx)
31
D1 D2
D3
S1
S2
...
S6 (.2, .8)
Service
Repository
ss1 (.4, .6
)
Brambilla, CeriSearch Computing: LIQUID QUERIES
Domain Repository: API
Three main interfaces:
Domain query: used to extract a domain (or a list of domains), and their corresponding properties, that relate to a specific string
Service extraction: used to extract the list of services associated to the domain
Domain hierarchy update: used to update the domain hierarchy
Brambilla, CeriSearch Computing: LIQUID QUERIES
Query Analyser
Starts from almost- natural language specification of the user request
tries to determine a decomposition in subqueries that can fit the problem of mapping on a domain
E.g.: scientific conference reachable with a cheap flight, with a beautiful beach nearby
Target splitting – q1=“scientific conference ",– q2=“reachable with a cheap flight ", and – q3=“with a beautiful beach nearby ".
Brambilla, CeriSearch Computing: LIQUID QUERIES
Query Analyser
For NLP, we exploit an open source tool developed by the Stanford Natural Language Processing Group
The outcome is a tree representation of the query
Definition of euristics for query splitting
To optimize the recognition of query-subquery relations: – iterative invocation of the NLP tool based on various arguments
(feedback from user, feedfwd/back from other components, ...);– exploitation of knowledge/services available in other
components. E.g., knowledge– about the available services, domains, and so on;– syntax/logic analysis results on the sentence.
Brambilla, CeriSearch Computing: LIQUID QUERIES
Back to the example - 2
scientific conference reachable with a cheap flight, with a beautiful beach nearby
Very coarse euristics: – Subqueries = first level subtrees
Obtained splitting – q1=“scientific conference reachable",– q2=“with a cheap flight ", and – q3=“with a beautiful beach nearby ".
35
Brambilla, CeriSearch Computing: LIQUID QUERIES
Back to the example - 3
Still not exact, but rather close (being the first shot:)
Further information can be extracted– Association between words: e.g., cheap_flight– Meaning of phrase connectives
And ...– What about negation?– What about join attributes between phrases?– ...
36
Brambilla, CeriSearch Computing: LIQUID QUERIES
Query Analyser
Tasklist– Extraction of a corpus of queries from Yahoo! Answers– Definition of concrete options for optimization of the extractor– Training? – Validation of the approach on the corpus– Mapping: currently could be trivial on keywords of domains