liferay search: best practices to dramatically improve relevance - liferay symposium north america...
TRANSCRIPT
#LSNA17
#LSNA17
#LSNA17
SEO Relevance
Pages Liferay assets
Whole text is indexed Key/value docs are indexed
Opaque ranking criteria Scored queries, filters, field types
Reverse engineer Fine tune
Third party algorithms Search engine that you control
#LSNA17
GET /_search?explain{ "query" : { "term" : { "tag" : "LSNA17" } }}
GET /index/type/0/ _explain?q=user_id:2
"value" : 2.7051764, "description" : "score(doc=0,freq=1.0), product of:", "details" : [ { "value" : 0.66422296, "description" : "queryWeight, product of:", "details" : [ { "value" : 4.0726933, "description" : "idf(docFreq=4, maxDocs=108)" }, { "value" : 0.16309182, "description" : "queryNorm" } ] }, { "value" : 4.0726933, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" } ] }, { "value" : 4.0726933, "description" : "idf(docFreq=4, maxDocs=108)" }, { "value" : 1.0, "description" : "fieldNorm(doc=0)"
"failure to match filter: cache(user_id:[2 TO 2])"
#LSNA17
query = apple eclipse zzz yyy xxx qqq kkk ttt rrr
2.345 doc1: apple banana 2.345 doc2: eclipse moon sun16.415 doc3: zzz yyy xxx qqq kkk ttt rrr 111
#LSNA17
(Term Frequency/Inverse Document Frequency)
In question form... Score increases...
Term frequency How often a term appears in a field? + When the term pops up a lot of times along the text
Inverse Document Frequency
How rare is the term in the whole index? + When the term is found in this document and not many others
Field-length norm How short is the field where the term is? + When there isn't much else in the same field (like, a title)
#LSNA17
•
•{ "must" : { "bool" : { "should" : [ { "match" : { "content_en_US" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : {
"content_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "content_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }, { "bool" : { "must" : { "bool" : { "should" : [ { "match" : { "description_en_US" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : { "description_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "description_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }, { "bool" : { "must" : { "bool" : { "should" : [ { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }, { "bool" : { "must" : { "bool" : {
"should" : [ { "match" : { "title_en_US" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : { "title_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "title_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }
#LSNA17
● → FacetedSearcher →
● Indexer
● fields
● score{ "match" : { "content_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } }
{ "match" : { "description_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "boolean" } } }
#LSNA17
Natural language?
string:text
● TF/IDF● case insensitive
Score!
IDs and Serials?
string:keyword
● not_analyzed● case sensitive● match | no match
No score!
Non string data?
integer,date,geo_point...
● match | no match No score!
(... "no score" really a const = 1)
#LSNA17
// IndexSettingsContributor
typeMappingsHelper. addTypeMappings(indexName, myCustomFieldMappings);
liferay-type-mappings.json"content": { "index": "analyzed", "type": "string"},"organizationId": { "index": "not_analyzed",
"type": "string"},"publishDate": { "format": "yyyyMMddHHmmss",
"type": "date"}
#LSNA17
• Analyzed human searches
• query types
• combinations
• best relevance
Favor text fields over keyword fields.
#LSNA17
"*ubstrin*"
• lowercase
• * → "full scan" ↓↓↓
• don't score
#LSNA17
1. full text search
2. Prefix
3. n-grams
#LSNA17
• Match →
• Prefix →
• Phrase →
Know your field, use the right queries.
#LSNA17
Write a field specific query builder@Component(service = FieldQueryBuilder.class, immediate = true)
public class MyFieldQueryBuilder implements FieldQueryBuilder {
public Query build(String field, String keywords) {
Fine tune the right queries for your fieldmyBooleanQuery.add(q1, MUST); myBooleanQuery.add(q2, SHOULD); ...
#LSNA17
多言語検索
• Map
• suffix →
• "b" "a" "d"
• Stemming, stopwords(https://www.elastic.co/guide/en/elasticsearch/guide/current/using-language-analyzers.html)
Pick the right language analyzer.
#LSNA17
document.addText(" myField_ja_JP", japanese);document.addText(" myField_en_US", english);
Locale defaultLocale = portal. getSiteDefaultLocale (groupId);document.addText( getLocalizedName("myField", defaultLocale), translation);
addSearchLocalizedTerm (searchQuery, searchContext, " myField");
searchContext.setLocale(themeDisplay.getLocale());
liferay-type-mappings.json"template_ja": { "mapping": { "analyzer": "kuromoji" }, "match": "\\w+_ja_[A-Z]{2}\\b"}
#LSNA17
• description, content
• title, title_en_US
• content
2x matching query clauses = inflated relevance.
Match once and only once.
#LSNA17
If already indexing once...document.addText(getLocalizedName("myField", languageId), translation);
… no need to index twice...// DON'T //// document.addText(" myField", content);
… match once and only once.addSearchLocalizedTerm(searchQuery, searchContext, " myField");
// DON'T //// addSearchTerm(searchQuery, searchContext, " myField");
#LSNA17
• docs
• value
• display
• highlight
Index for rendering, render from doc.
#LSNA17
analyzed
✔
✗
[30] Liferay[15] DXP[15] Symposium
#LSNA17
not_analyzed
✔
✗
[15] Liferay DXP
[15] Liferay Symposium
#LSNA17
• Aggregate not_analyzed– [15] Liferay DXP
– [15] Liferay Symposium
• Match analyzed
–
2 fields, 1 analyzed, 1 not_analyzed.
#LSNA17
Search on the text field
new MatchQuery("myfield", keywords);
Aggregate on the keyword field
myFacet.setFieldName("myfield.raw");
#LSNA17
• multifields(https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html)
• Copy Fields(https://wiki.apache.org/solr/SchemaXml#Copy_Fields)
• analyzed
• not_analyzed
#LSNA17
• elasticsearch-head
• Solr Admin
• query string
• explain
Tweak clauses, re-run query, repeat.
#LSNA17
#LSNA17
#LSNA17
Thank you!And lots of relevant content at #LSNA17
#LSNA17