graphs, graphs everywhere - lucene powered relation exploration
TRANSCRIPT
z
Graphs, graphs everywhere
Zbyszko Papierski, Senior Dev@JIRA Cloud, T:@ZPapierski
Lucene powered relation exploration
z
Agenda
1. Introduction to Lucene and friends 2. Evolution of data analysis by Solr and Elasticsearch 3. Graph capabilities of Elasticsearch(briefly) 4. Solr - QueryParserPlugin 5. Solr - Streaming Expressions 6. Examples
z
Lucene
Provides mechanism for fast searching of text data - both full-text search(analyzed data) and exact match(non-analyzed, or docValues)
z
Step one - indexing
{kitty|kitten|cat|cats|kittens|pussy} —> cat {is} —> {GORGEOUS!!!} —> gorgeous, pretty, nice, etc.
z
Step one - searching{very} —> very {nice} —> nice {kitty} —> cat
{nice, cat, …} {very, ugly, cat, …}{very,nice, dog, …}
{very, nice, bear, …}
z
Step one - scoring{very} —> very {nice} —> nice {kitty} —> cat
{nice, cat, …} {very, ugly, cat, …}{very,nice, dog, …}
{very, nice, bear, …}
z
Winner!
nice and cat score higher than very and nice
or very and catbecause cat is rarer than very
this is only an example, all cats are nice…
z
• full text searching • faceting/aggregation • statistical • relationship exploration
How did we get here?
z
• From Elasticsearch 2.3 • REST API - /_graph/explore • visualization for Kibana • Part of elastic commercial offering (named
from 5.0 X-Pack)
Elasticsearch+Kibana
Plugin for Elasticsearch and Kibana - Graph
picture from: https://www.elastic.co/guide/en/graph/current/graph-introduction.html
z
• Available from Solr 6.0 • experimental feature • currently, works for single node, single core
applications (due to change) • no 1st party visualization • does not track edges of the traversal
Solr
built-in GraphQueryParser
picture from: http://solr.pl/2016/04/25/wizualizacja-grafow-przy-pomocy-solr-6/
z
• Available from Solr 5.5 • experimental feature
• no 1st party visualization • does track edges of the traversal and level
Solr
built-in Streaming Expressions
picture from: http://solr.pl/2016/04/25/wizualizacja-grafow-przy-pomocy-solr-6/
z
ParamsreturnRoot
Should the root set of documents (found by initial query) be returned. Default: true
z
Streaming Expressions
• New alternative way of creating and processing queries • allow chaining functions • also experimental • graph functions - shortestPath, gatherNodes, scoreNodes
z
shortestPath
• one of the source functions - function producing tuple stream • returns shortest path between to given nodes using iterative breadth-first search of the graph
z
shortestPath - params
• collection - collection to perform the search • from - starting node • to - ending node • edge - definition of edge, in format <from-field>=<to_field> • fq - filter query, which filters out nodes taken into account • maxDepth - maximal depth of the traversal
z
gatherNodes
• transforms input document stream to stream of accessible, through graph traversal, documents
• can return edges • allows nesting functions • works for multi-collection streams, irregardless of number of cluster nodes • is also a source function • currently does not support multivalued fields
z
gatherNodes - params
• collection - collection on which function will be performed • walk - defines starting nodes and the field, e.g. „[email protected]>from” • gather - defines which fields are gathered • scatter - parameter that can have values(one or both):
• leaves - emits only leaf nodes (outer-most ones) • branches - emits nodes leading up to leaves (root node is a branch)
• fq - filter query that filters out nodes • maxDocFreq - every node in the result over this number is filtered out
Aggregations, cross-collection gathering and combining with other streaming expressions is possible
z
scoreNodes
• Function user only with output of gatherNodes • Score document relevancy, using TF-IDF formula
• As TF - how often document appeared on graph traversal • IDF is fetched from documents original collection
• Adds additional field, nodeScore, to the output stream