[grabec] solr - searching made easy
TRANSCRIPT
SEARCHING IN JAVA: QUICK OVERVIEW
• Techonoligies
• Techniques
• Approaches
BASIC APPROACH
• REGEX
• SQL LIKE
BASIC APPROACH
• Slow
• Ineffective
3F-S OF THE SEARCH ENGINES
• Fast
• Flexible
• Fit
FAST – INVERTED INDEX
• Query through 10GB index or bigger in
under 100 ms
INDEX STRUCTURE
Relational databaseSearch engine index
FLEXIBLE
• Create / update / delete index via:
– XML
– JSON
– API
– Annotations in entity classes (Hibernate)
– ...
FIT
saw the BC-500 fix
noticed that the BC500 is fixed
?
FIT – FILTERS & ANALYSERS
• WordDelimiterAnalyser
saw the BC-500 fixed
saw the BC 500 fixed
FIT – FILTERS & ANALYSERS
• StopwordFilter
saw the BC 500 fixed
saw BC 500 fixed
FIT – FILTERS & ANALYSERS
• SynonimAnalyser
saw BC 500 fixed
saw/visualise/notice BC 500 fixed
FIT – FILTERS & ANALYSERS
• Lemmatiser/stemmer
saw/visualise/notice BC 500 fixed
see/visualise/notice BC 500 fix
FIT – RESULT
saw the BC-500 fix ... noticed that the BC500 is fixed ...
see/visualize/notice BC 500 fix
Analyze/filter... Analyze/filter...
FIT – QUERY EXAMPLE
• Political debate
all: “political debate”^2 OR (all: political AND all: debate) OR
title: (“political debate”~5)^4 OR title: “political”^2 OR
title: “debate”^2
Document 1 Document 1Document 1
SPELL CHECKER
HIT HIGHLIGHTING
REPLICATION
SLAVESLAVE
MASTER
REQUESTS
JAVA SEARCH ENGINES BUILT ON LUCENE
• Solr
• Nutch
• Compass
• Hibernate Search
?