the power of faceted search in alfresco

31
The Power of Faceted Search in Alfresco Roxana Angheluta Willem Van den Eynde

Upload: xenit-solutions-nv

Post on 20-Aug-2015

1.840 views

Category:

Economy & Finance


1 download

TRANSCRIPT

Page 1: The power of faceted search in alfresco

The Power of Faceted Search in Alfresco

Roxana AnghelutaWillem Van den Eynde

Page 2: The power of faceted search in alfresco

This presentation

● Who are we ?● Alfresco● Faceted Search● Demo

Page 3: The power of faceted search in alfresco

Willem Van den Eynde

● 2004 - 2007○ Bachelor applied informatics, KHLeuven

● 2006 - 2007○ LiU Erasmus Sweden

● 2007○ internship in Paris

● 2007 - 2010○ Master in applied informatics KULeuven

● 2010 - current○ Software Engineer XeniT Leuven

Page 4: The power of faceted search in alfresco

Roxana Angheluta

● 1995 - 1999○ Bachelor informatics, University of Bucharest

● 2000 - 2001○ Erasmus student KULeuven

● 2001-2004○ Assistant researcher KULeuven

● 2003-2004○ Master in Artificial Intelligence KULeuven

● 2004 - 2012○ Software Engineer Attentio Brussels

● 2012 - 2013○ Software Engineer XeniT Leuven

Page 5: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

Introducing XeniT

Managing content in a smart way

Page 6: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

From our home base

In collaboration with our customers

With an enthusiastic and experienced team

Page 7: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

The corporate story of XeniT

2007 2008 2009 2010

IWT projectConcurrent

collaboration

3.5 M docs

8 M docs

2011 2012

Alfresco-As-A-

Service

2013

Page 8: The power of faceted search in alfresco

Alfresco is the largest private, pure-play open source software company in the world.

4 million+ downloads of Alfresco community75,000+ sites running community2000+ Enterprise customers from 43+ countries200+ channel partners20 consecutive quarters of revenue growthfounded in 2005

Maidenhead, UK Global Headquarters Atlanta, US Headquarters

Page 9: The power of faceted search in alfresco

What is Alfresco

Alfresco is an open source enterprise content management system

Page 10: The power of faceted search in alfresco

What is Alfresco ?

● Enterprise Content Management (ECM)

is a formalized means of organizing and storing an organization's documents, and other content, that relate to the organization's processes. The

term encompasses strategies, methods, and tools used throughout the lifecycle of the

content.

Page 11: The power of faceted search in alfresco

Classification and Retrieval

● Classification● Retrieval

Page 12: The power of faceted search in alfresco

FAQ

● How does an open-source company like Alfresco generate revenue ?

● Alfresco vs Microsoft SharePoint

Page 13: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

Alfresco demo

Page 14: The power of faceted search in alfresco

Search in Alfresco

● Many search engines out there, few engines really good, fewer open source

● Requirements:○ accurate○ performant○ flexible○ cross-platform○ scalable○ mature

● Lucene○ https://lucene.apache.org/

● Starting with Alfresco 4.0 => Solr○ http://lucene.apache.org/solr/

Page 15: The power of faceted search in alfresco

Lucene

● Java-based indexing and search library, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities

● HistoryDoug Cutting originally wrote Lucene in 1999.[2] It was initially available for download from its home at the SourceForge web site. It joined the Apache Software Foundation's Jakarta family of open-source Java products in September 2001 and became its own top-level Apache project in February 2005.

● Many projects based on Lucene: Solr, Nutch, Elasticsearch

Page 16: The power of faceted search in alfresco

Lucene

Indexing

● over 150GB/hour on modern hardware● small RAM requirements -- only 1MB heap● incremental indexing as fast as batch indexing● index size roughly 20-30% the size of text indexed

Searching● ranked searching -- best results returned first● many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more● fielded searching (e.g. title, author, contents)● sorting by any field● multiple-index searching with merged results● allows simultaneous update and searching● flexible faceting, highlighting, joins and result grouping● fast, memory-efficient and typo-tolerant suggesters● pluggable ranking models, including the Vector Space Model and Okapi BM25● configurable storage engine (codecs)

Page 17: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

Lucene in Alfrescohttp://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation

Page 18: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

Lucene in Alfrescohttp://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation

Page 19: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

Lucene in Alfrescohttp://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation

Page 20: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

Lucene in Alfrescohttp://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation

Page 21: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

Lucene in Alfrescohttp://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation

Page 22: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

Lucene in Alfrescohttp://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation

The way to preserve information in the Lucene index is specified in Alfresco's data modelsMain concept: tokenization

Page 23: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

Lucene in Alfrescohttp://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation

with

without tokenization with tokenization

Page 24: The power of faceted search in alfresco

2009 - Proprietary and Confidential Information of Xenit Solutions

Lucene in Alfresco

● Out of the box search:○ search in all items, in a certain property or in the

content (full text search)

○ additionally: PATH, ASPECT, CATEGORY searches

○ Lucene syntax allowed:■ boolean queries■ wildcard queries■ range queries

Page 25: The power of faceted search in alfresco

Solr

● Standalone full-text search server within a servlet container such as Tomcat. Uses Lucene library and has REST-like HTTP/XML and JSON API. Has an extensive plugin architecture.

● In 2004, Solr was created by Yonik Seeley at CNET_Networks and in January 2006 the source code was donated to the Apache Software Foundation under the Lucene top-level project. In March 2010, the Lucene and Solr projects merged and consequently in 2011, the Solr version number scheme was changed in order to match that of Lucene.

● Many users:○ http://wiki.apache.org/solr/PublicServers

Page 26: The power of faceted search in alfresco

Solr

● Uses the Lucene library for full-text search● Faceted navigation● Hit highlighting● Query language supports structured as well as textual search● JSON, XML, PHP, Ruby, Python, XSLT, Velocity and custom Java binary output formats over HTTP● HTML administration interface● Replication to other Solr servers - enables scaling QPS● Distributed Search through Sharding - enables scaling content volume● Search results clustering based on Carrot2● Extensible through plugins● Pluggable relevance - boost through formula● Caching● Embeddable in a Java Application

Page 27: The power of faceted search in alfresco

Faceted Search in Alfresco

● A way to navigate through the documents, showing counts per property value and offering the possibility to drill down in the data

● Faceted search supported by Lucene/Solr, not yet supported by Alfresco

● Implemented by Xenit in Fred

Page 28: The power of faceted search in alfresco

Faceted Search in Alfresco

Page 29: The power of faceted search in alfresco

Faceted Search in Alfresco

● Questions○ which fields should be facetable?

■ only the ones with a limited set of possible values■ only the ones which are untokenized■ plus ranges: dates and numbers

○ how to navigate inside facets?

● Current implementation○ facetable fields configurable in a file○ date ranges and number ranges not supported yet○ drilling-down in a single value possible

Page 30: The power of faceted search in alfresco

Faceted Search in Fred: mockup

Page 31: The power of faceted search in alfresco

Demo