payloads in solr - erik hatcher, lucidworks

29
Payloads in Solr Erik Hatcher Senior Solutions Architect / co-founder, Lucidworks

Upload: lucidworks

Post on 21-Jan-2018

357 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr Erik Hatcher

Senior Solutions Architect / co-founder, Lucidworks

Page 2: Payloads in Solr - Erik Hatcher, Lucidworks

Solr now smoothly integrates with Lucene-level payloads. Payloads provide optional per-term metadata, numeric or

otherwise. Payloads help solve challenging use cases such as per-store product pricing and per-term confidence/weighting.

This session will present the payload feature from the Lucene layer up to the Solr integration, including per-store pricing,

per-term weighting, and more.

Payloads in Solr

Page 3: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01tl;dr

• Solr 6.6+ via SOLR-1485 • per-term position metadata • Use cases:

• per-store pricing • weighting terms: e.g. confidence of

term, or importance/relevance of term • weighting term types (synonyms

factor lower, verbs factor higher)

Page 4: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01Lucene’s Payloads

• Token: PayloadAttribute • byte[] per term position, optional • Several components set payloads

• Similarity.SimScorer #computePayloadFactor

• No built-in components (outside Lucene’s test cases), before SOLR-1485, implemented this

• PostingsEnum#getPayload

Page 5: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr

http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/codecs/lucene50/

Lucene50PostingsFormat.html

Postings Format

Page 6: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01Lucene’s Token

• Field • Attributes:

• CharTerm: term text • … Keyword, Type, Offset,… • and Payload!

Page 7: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01setPayload(bytes)

• DelimitedPayloadTokenFilter • NumericPayloadTokenFilter • TokenOffsetPayloadTokenFilter • TypeAsPayloadTokenFilter • pre-analyzed field (Solr)

Page 8: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01DelimitedPayloadTokenFilter

Page 9: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01DelimitedPayloadTokenFilter

• term1|payload1 term2|payload2 • encodes payloads as:

• float, • int, • or string / raw bytes

Page 10: Payloads in Solr - Erik Hatcher, Lucidworks

field weighted_terms_dps term one doc 0 freq 1 pos 0 payload 1.0 term three doc 0 freq 1 pos 2 payload 3.0 term two doc 0 freq 1 pos 1 payload 2.0 term weighted doc 1 freq 2 pos 0 payload 50.0 pos 1 payload 100.0

Page 11: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01Use Cases

• products with per-store pricing • boosting by weighted terms • down-boosting synonyms

Page 12: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01Traditional per-store pricing

strategies• Explode docs:

• num_docs=products * stores (1M products * 5000 stores could be up to 5B docs!)

• query-time collapsing (by product id)

• Explode fields: • default_price • store_price_0001 • store_price_0002 • …

store_price_NNNN • query-time field

choice • eg. up to 5000 fields

per document

Page 13: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01Payload-based per-store pricing

• default_price • store_prices:

• terms: STORE_0001… STORE_NNNN • per-term payload of price

• One additional field • with up to num_stores terms/payloads

Page 14: Payloads in Solr - Erik Hatcher, Lucidworks
Page 15: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01Down-boosting synonyms

id,synonyms_with_payloads 99,tv

synonyms.txt Television, Televisions, TV, TVs

/select?wt=csv&fl=id,score& q={!payload_score f=synonyms_with_payloads v=$payload_term func=max} &payload_term=television id,score 99,0.1

&payload_term=tv id,score 99,1.0

Page 16: Payloads in Solr - Erik Hatcher, Lucidworks

{ "add-field-type": { "name": "synonyms_with_payloads", "stored": "true", "class": "solr.TextField", "positionIncrementGap": “100", "indexAnalyzer": { "tokenizer": { "class": "solr.StandardTokenizerFactory" }, "filters": [ { "class": "solr.SynonymGraphFilterFactory", "expand": "true", "ignoreCase": "true", "synonyms": "synonyms.txt" }, { "class": "solr.LowerCaseFilterFactory" }, { "class": "solr.NumericPayloadTokenFilterFactory", "payload": "0.1", "typeMatch": "SYNONYM"

} ] },

,"queryAnalyzer": { "tokenizer": { "class": "solr.StandardTokenizerFactory" }, "filters": [ { "class": "solr.LowerCaseFilterFactory" }

]

} }}

Page 17: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01Solr Integration

• Schema-aware • DelimitedPayloadTokenFilter:

• float, integer, identity • NumericPayloadTokenFilter: float

• Function / Value Source • payload()

• Query parsers • {!payload_score} • {!payload_check}

• Default (data_driven) schema has built-in payload-enabled dynamic field mappings:

• *_dpf, *_dpi, and *_dps

Page 18: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01Solr features with payloads

• searching (scoring by payload):q={!payload_score…}

• searching (filtering by payload):fq={!frange cost=999 l=0 u=100}payload(…)

• sorting:sort=payload(…) desc

• faceting:facet.query={!frange l=0 u=100 v=$payload_func}&payload_func=payload(…)

Page 19: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01payload()

• payload(field, term [,default_value [,min|max|average|first]])

• Operates on float or integer encoded payloads • Value source, returning a single float per-document • Multiple term matches are possible, returning the min,

max, or average. first is a special short-circuit • If no term match for document, returns default value,

or zero

Page 20: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01payload() uses

• &payload_function=payload(….) • Returning:

fl=payload_result:${payload_function} • Sorting:

sort=${payload function} desc • Range faceting:

facet.query={!frange key=up_to_one_hundred l=0 u=100 v=$payload_function}

• Matching: • without payload considered: term query, eg {!term} • with payloads factored: {!payload_check}

Page 21: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01{!payload_score}

• SpanQuery wrapping, payload-based scoring • SpanQuery support: currently SpanNearQuery of

SpanTermQuery’s • scoring:

• payload function: min, max, or average • includeSpanScore=true: multiples payload

function result by base query scoring • with a simple term query, payload() function is

equivalent (with includeSpanScore=false)

Page 22: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01{!payload_score} examples

{!payload_score f=payloaded_field_name v=term_value func=min|max|average [includeSpanScore=false] }

{!payload_score f=vals_dpf func=average v=weighted includeSpanScore=true}

Page 23: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01{!xmlparser}

• {!xmlparser} <BoostingTermQuery fieldName="weighted_terms_dpf"> weighted </BoostingTermQuery>

• == {!payload_score f=weighted_terms_dpf func=average includeSpanScore=true}

Page 24: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01{!payload_check}

• SpanQuery wrapping, phrase relevancy scoring • SpanQuery support: currently SpanNearQuery of

SpanTermQuery’s • matching:

• matches when all terms match all corresponding payloads, in order

• scoring: • uses SpanNearQuery’s score

Page 25: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01{!payload_check}

id,words_dps 99,taking|VERB the|ARTICLE train|NOUN

q={!payload_check f=words_dps v=train payloads=NOUN}

q={!payload_check f=words_dps v='the train' payloads='ARTICLE NOUN'}

Page 26: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01Payload Cons

• payload(): if used as a {!func} q or facet.query it will compute value for ALL documents in index. To PostFilter fq payload function computation of just matching documents use {!frange} with payload()

• Updating values • Atomic field update

• (could multivalue and delete/add a single term|value)? • could mean updating all inventory for all stores for a single

change • no current range faceting support (of functions in general)

Page 27: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr01What’s next

• SOLR-10541 - “Range facet by function” • solves range faceting by payload

• LUCENE-7854: term frequency “payload” • coming soon, see SOLR-11358

• OpenNLP types => payloads • Pluggable encoders/decoders?

Page 28: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr

https://lucidworks.com/2017/09/14/solr-payloads/

Further reading

Page 29: Payloads in Solr - Erik Hatcher, Lucidworks

Payloads in Solr