making your domain objects searchable with hibenate search
DESCRIPTION
Presentation about Hibernate Search done in Lucene Apache Eurocon at Prague, Czech Republic on May 20thTRANSCRIPT
Making Your Domain Objects Searchable with Hibernate
SearchGustavo Fernandes
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Agenda
2
Mo#va#ons and Goals
Indexing
Retrieval
Scalability
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Hibernate in a nutshell
3IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Hibernate in a nutshell
4
@Entitypublic class Author { @Id @GeneratedValue private Integer id; private String name; @OneToMany private Set<Book> books;}
@Entitypublic class Book { private Integer id; private String title;}
@Entitypublic class Book { private Integer id; private String title;}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Hibernate in a nutshell
5
@Entitypublic class Author { @Id @GeneratedValue private Integer id; private String name; @OneToMany private Set<Book> books;}
@Entitypublic class Book { private Integer id; private String title;}
@Entitypublic class Book { private Integer id; private String title;}
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Hibernate in a nutshell
6
@Entitypublic class Author { @Id @GeneratedValue private Integer id; private String name; @OneToMany private Set<Book> books;}
@Entitypublic class Book { private Integer id; private String title;}
@Entitypublic class Book { private Integer id; private String title;}
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
Author author = new Author(“Stephen King”);Book aBook = new Book(“Blaze”);HashSet<Book> books = new HashSet<Book>();books.add(aBook);author.setBooks(books);Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();session.save(author);tx.commit();
Select * from Author;+----+--------------+| id | name |+----+--------------+| 1 | Stephen King | +----+--------------+
Select * from Book;+----+----------+| id | title |+----+----------+| 1 | Blaze |+----+----------+
Select * from Book_Author;+---------+------------+| Book_id | authors_id |+---------+------------+| 1 | 1 |+---------+------------+
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Hibernate extension which uses Lucene internally
Bring full text search capabiliIes to Hibernate
Object-‐Document mapping
Take care of the plumbing
Keep database and index in sync
ConvenIon over configuraIon
Flexible
7
Meet Hibernate Search
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Meet Hibernate Search
Current version: 3.2.0-‐Final (May/2010)
LGPL License
Lucene version supported: 2.9.2
Solr version supported: 1.4
8IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Meet Hibernate Search
Dependencies:
<dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search</artifactId> <version>3.2.0.Final</version> </dependency>
9IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing
Mapping Objects <-‐> Documents
Support for types
Analyzers/Boost
Transparent/Manual Indexing
10IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping EnIIes@Entitypublic class Author {
@Id @GeneratedValue private Integer id;
private String name;
@OneToMany private Set<Book> books; }
11IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping EnIIes@Indexed@Entitypublic class Author {
@Id @GeneratedValue private Integer id;
private String name;
@OneToMany private Set<Book> books; }
12IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping EnIIes@Indexed@Entitypublic class Author {
@Id @GeneratedValue private Integer id;
private String name;
@OneToMany private Set<Book> books; }
13IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping EnIIes@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue private Integer id;
private String name;
@OneToMany private Set<Book> books; }
14IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping EnIIes@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id;
private String name;
@OneToMany private Set<Book> books; }
15IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping Fields@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Field private String name;
@OneToMany private Set<Book> books; }
16IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping Fields@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Field(name = name_field, store = Store.YES, index = Index.TOKENIZED) private String name;
@OneToMany private Set<Book> books; }
17IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping Fields@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Fields( { @Field(index = Index.TOKENIZED), @Field(name= “nameForSort”, index = Index.UN_TOKENIZED) } ) private String name;
@OneToMany private Set<Book> books; }
18IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Mapping RelaIonships@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Field(index = Index.TOKENIZED) private String name;
@OneToMany @IndexEmbedded private Set<Book> books;
}
19IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Types
20
@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Field(index = Index.TOKENIZED) private String name;
@OneToMany @IndexEmbedded private Set<Book> books;
@Field(bridge = @FieldBridge(impl = AddressBridge.class)) private Adress address;
}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Boost
21
@Indexed(index=”Author_Index”)@Entitypublic class Author { @Id @GeneratedValue @DocumentId private Integer id; @Field(index = Index.TOKENIZED) @Boost(1.5f) private String name;
@OneToMany @IndexEmbedded private Set<Book> books;
@Field(bridge = @FieldBridge(impl = AddressBridge.class)) @Boost(0.75f) private Adress address;
}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Analyzers
22
@Entity @Indexedpublic class Author { @Id @GeneratedValue @DocumentId private Integer id;
private String bio; ...}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Analyzers
23
@Entity @Indexed@AnalyzerDef(name=”combinedAnalyzers”, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),)public class Author { @Id @GeneratedValue @DocumentId private Integer id; private String bio; ...}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Analyzers
24
@Entity @Indexed@AnalyzerDef(name=”combinedAnalyzers”, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class) })public class Author { @Id @GeneratedValue @DocumentId private Integer id;
private String bio; ...}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Analyzers
25
@Entity @Indexed@AnalyzerDef(name=”combinedAnalyzers”, charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class) })public class Author { @Id @GeneratedValue @DocumentId private Integer id;
private String bio; ...}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Analyzers
26
@Entity @Indexed@AnalyzerDef(name=”combinedAnalyzers”, charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class) })public class Author { @Id @GeneratedValue @DocumentId private Integer id; @Analyzer(definition = “combinedAnalyzers”) private String bio; ...}
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Index -‐ Fluent APISearchMapping mapping = new SearchMapping();
mapping .analyzerDef("customAnalyzer", StandardTokenizerFactory.class) .filter(LowerCaseFilterFactory.class) .filter(SnowballPorterFilterFactory.class) .param("language", "English") .entity(Author.class) .indexed() .property("id",ElementType.FIELD).documentId() .property("adress", ElementType.FIELD) .field().bridge(AdressBrigde.class).store(Store.YES) .property("books", ElementType.FIELD).indexEmbedded() .property("name", ElementType.METHOD).field().store(Store.YES) .entity(Book.class) .indexed() .property("id", ElementType.METHOD).documentId() .property("title", ElementType.METHOD) .field().analyzer("customAnalyzer");
27IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Backend
28
Source: Hibernate Search in AcIon
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ Backend
hibernate.work.execu#on async
hibernate.work.thread_pool_size 1029
Source: Hibernate Search in AcIon
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Indexing -‐ JMS backend
hibernate.worker.backend jms
hibernate.worker.jms.connec#on_factory /Connec#onFactory
hibernate.worker.jms.queue queue/hsearch
30
Source: Hibernate Search in AcIon
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Manual Indexing
Use case Non-‐exclusive database
Manual Indexing types: Single enIty
Mass indexer
31IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Manual Indexing -‐ Single EnItyFullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
Object author = fullTextSession.load( Author.class, 1 );
fullTextSession.index(author);
tx.commit();
32IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Mass IndexingfullTextSession.createIndexer().startAndWait();fullTextSession.createIndexer().start();
33IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Retrieval -‐ Lucene Queries + Hibernate API// Wraps Hibernate Session Object
org.hibernate.seach.FullTextSession fullTextSession = org.hibernate.search.Search.getFullTextSession(session);
// Lucene queryVersion v = Version.LUCENE_29;
org.apache.lucene.queryParser.QueryParser queryParser = new org.apache.lucene.queryParser.QueryParser(v, "name", new StandardAnalyzer (v));
org.apache.lucene.search.Query query = queryParser.parse("+King");
// Hibernate search queryorg.hibernate.Query textQuery = fullTextSession.createFullTextQuery(query, Author.class);
Author loadedAuthor = (Author)textQuery.list();
34IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Retrieval -‐ Hibernate Search
1. Executes Lucene Query and get the results
2. Retrieves document ids from the index
3. Load objects from database
4. Return domain objects
35IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Retrieval -‐ Results ManipulaIon Pagina#on
Type restric#on
Projec#on
Result mapping
36IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Retrieval -‐ IndexReader shared strategy: shared IndexReader (default) hibernate.search.reader.strategy = shared
not-‐shared strategy: open IndexReader for every query hibernate.search.reader.strategy = not-shared
Extensible by using ReaderProvider Interfacehibernate.search.reader.strategy = com.mycompany.CoolReaderProvider
37IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Scalability
Sharding
Clustering
38IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Scalability -‐ Sharding
•Default: one index per en#ty type
•Shard: two or more indexes per en#ty type
•Use cases • Performance
• Maintenance
39
IndexApplicationQueryIndex
A - Z
Shard A
Shard B
Shard C
Application
A - H
I - N
O - Z
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Scalability -‐ Sharding
Indexes separated physically
Virtual Index
40
Shard A
Shard B
Shard C
VirtualIndex
ApplicationQueryIndex
A - H
I - N
O - Z
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Scalability -‐ Sharding
Configura#onhibernate.search.com.sourcesense.Author.sharding_strategy.nbr_of_shard 2
41IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Scalability -‐ Shard Strategy
Default algorithm: ID Hash
42
12345
f(x) = x % N
1 2
3
4
5
Shard 1
Shard 2
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Custom Sharding Strategy
Implement IndexShardingStrategy
hibernate.search.com.sourcesense.Author.sharding_strategy BookTitleStrategy
43IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Synchronous Clustering
Every node can read and write to the index
Pessimist locking prevents corrup#on
Single index shared among every node
Choose your flavour: NFS, Database, distributed caches
44IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Clustering
Read-‐Write Synchronous cluster
45
Index
Node 1
IndexWriter
Node 2
IndexWriter Node 3
IndexWriter
Node 4
IndexWriter
IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Asynchronous Clustering
46IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Asynchronous Cluster
Advantages Only master writes
No indexing in slaves -‐> no waiIng for locks
Downside Data is not visible immediately by the slaves
47IntroducIon ◆ Indexing ◆ Retrieval ◆ Scaling
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
To learn more...
48
hibernate.org/subprojects/search.html
anonsvn.jboss.org/repos/hibernate/search/
Sunday, 23 May 2010
Apache Lucene EuroCon 20 May 2010
Thank you
49
twicer: @gustavonalle
Sunday, 23 May 2010