bp-8 global federation and search

22
Global Federation and Search Robin Bramley, Ixxus

Upload: alfresco-software

Post on 05-Jul-2015

630 views

Category:

Technology


4 download

DESCRIPTION

Many global organizations face similar challenges around sharing information in a timely fashion between regions; for publishers this is often exacerbated due to the size of some their assets such as print quality images or video. Alfresco, with its open extensible architecture, makes a great basis for a global enterprise content or digital asset management system yet there are still numerous challenges to tackle when implementing on a global scale. Federation is one approach that can be used successfully when the regions are generally independent in the production of content, but are producing assets that can be consumed and re-used globally. Alfresco 4.0 uses Solr and that can be leveraged to provide federated search across multiple, disparate Alfresco repositories. This session will cover how: Federated search provided remote content discovery; Share was customized to handle federated search; Intelligent storage provided eventual consistency of files; and Users could request content migration on-demand.

TRANSCRIPT

Page 1: BP-8 Global Federation and Search

Global Federation and Search!

Robin Bramley, Ixxus!

Page 2: BP-8 Global Federation and Search

Agenda!

•  Who I am!•  Setting the scene!•  The business challenge!•  Alfresco!•  Solr!•  Big Content!•  Global considerations!

•  Scaling strategies!•  Alfresco 4!•  Federation approaches!•  ʻIntelligentʼ storage!•  Challenges!

Page 3: BP-8 Global Federation and Search

My Background!

•  Senior Architect @ Ixxus!•  The UK Alfresco Platinum Partner!•  Lucid Imagination partner!

•  Worked at consultancies for 13 years!•  Developing solutions with Alfresco since 0.6!•  First UK Alfresco Gold partner!

•  Around the edges I also write!•  GroovyMag author – inc. 4 hands-on Grails articles!•  DZone Most Valuable Blogger!

•  Re-published posts include Event Driven indexing with Solr!•  Open source contributions include!

•  OpenID support for Acegi / Spring Security!•  Codenarc support for Hudson / Jenkins CI Violations plugin!

Page 4: BP-8 Global Federation and Search

The challenge!

•  Many global organisations face similar challenges around sharing information in a timely fashion between regions.

•  For publishers this is often exacerbated due to the size of some their assets such as print quality images or video.

Page 5: BP-8 Global Federation and Search

Alfresco!

Hopefully this needs little introduction. •  Clue: itʼs an ECM!

Page 6: BP-8 Global Federation and Search

Apache Solr!

RESTful Search Service •  POST it documents!•  GET query results!

•  Built on top of Lucene!•  Originated from CNET (created by Yonik Seeley)!•  Features !

•  Schema!•  Request handlers!•  Query types!•  Response Writers!•  Admin pages!•  Replication!•  Sharding!

•  Professional support available from Lucid Imagination!

Page 7: BP-8 Global Federation and Search

Big Content

Page 8: BP-8 Global Federation and Search

Going global!

Page 9: BP-8 Global Federation and Search

Going global!

Global systems can pose additional challenges

•  Infrastructure •  Network!

•  Bandwidth!•  Latency!•  Reliability!

•  Languages •  Timezones •  Collaboration •  Workflow •  Security permissions

Page 10: BP-8 Global Federation and Search

Scaling strategies!

You can scale / divide & conquer systems in a number of ways:

•  Scale up (vertical)

Page 11: BP-8 Global Federation and Search

Scaling strategies!

•  Scale out (horizontal)

•  Typically clustering!

•  But could also be!

•  Replication!

•  Separation of responsibilities!

Page 12: BP-8 Global Federation and Search

Scaling strategies!

•  Partitioning

•  Data Sharding!

•  Silos !•  Divisional / departmental!•  Regional!

Page 13: BP-8 Global Federation and Search

Alfresco 4!

What’s new in Alfresco 4.0? •  Wonʼt repeat the full press release here…!•  ʻCloud-scale performanceʼ!

•  Alfresco Index Server based on Apache Solr!•  Enhanced clustering!

Page 14: BP-8 Global Federation and Search

•  Based on Solr 1.4.1!•  Uses a custom alfrescoDataType fieldType!•  Leverages dynamic schema fields heavily!

•  Only statically defined field is ʻidʼ!•  Everything else (*) is a multi-valued dynamic field!

•  Though it uses the Alfresco model dictionary under the hood!•  Analysis chain (same for index/query)!

•  Whitespace tokenized !•  Word Delimited!

•  Breaks up camelCase etc.!•  Converted to lower case!

•  Adds a cmis request handler!•  Uses SSL client certificate authentication!

Alfresco 4 Solr!

Page 15: BP-8 Global Federation and Search

Federating!

Page 16: BP-8 Global Federation and Search

Federation Approaches!

Pros •  Can index many different

data sources!•  File systems!•  Databases!

Cons •  Timeliness!•  Pull model not suitable for all

scenarios!•  Additional storage

requirements!•  Indexing can be inefficient in

a global scenario!•  Permissions!

Build an index with a crawler

Page 17: BP-8 Global Federation and Search

Federation Approaches!

Federated Search using OpenSearch •  A collection of simple formats for sharing

search results!•  Can use an Atom response format!•  Elements such as totalResults used in

CMIS Atom binding!•  Was a big deal in Alfresco 2.0 (2007)!

•  Alfresco Explorer has an OpenSearch client!•  Alfresco has an OpenSearch server!

•  Provided keyword search!•  Wiki stated: ʻNote: Advanced Web Client Search and

Query Language searches will be OpenSearch enabled some time in the future, probably in line with up-and-coming CM standards.ʼ!

•  Client not in Share!•  CMIS a better bet for complex queries!

Page 18: BP-8 Global Federation and Search

Federation Approaches!

Pros •  Can work across

heterogeneous search engines!

•  Can implement asynchronous results!

Cons •  Rebuilding the wheel?!•  Authentication is a challenge

(without SAML or OAuth) !

Build a meta-search service

Page 19: BP-8 Global Federation and Search

Federation Approaches!Solr shards •  Treat separate Alfresco repositories Solr cores as separate shards!

Pros •  Distributed queries are a

standard Solr feature!

Cons •  The repositories need to be

backed by a single authentication source!

•  E.g. LDAP!•  Asynchronous results arenʼt

supported OOTB!

Page 20: BP-8 Global Federation and Search

ʻIntelligentʼ storage!

Storage Cloud Technology •  Underpinning for the repository is a storage cloud technology!

•  Uses a Content Store Selector!•  Base layer built on commodity hardware!

•  Keeps multiple replicas of the content!•  Management layer !

•  Cost-based routing!•  Knows where content resides!

•  On-demand content migration between repositories!

Page 21: BP-8 Global Federation and Search

Challenges!

•  Large file size •  Has to work with streaming!•  Beware of anything that attempts to buffer a full file into memory!

•  E.g. to POST it!•  Watch out for processes that need to copy a file!

•  User expectations •  Need training on asynchronous behaviour!•  Search results and their appearance!

•  Grouping / sort!•  Pagination (of distinct result sets)!

•  Time to migrate large content!•  Can be lengthy if there isnʼt a ʻnearʼ copy!

Page 22: BP-8 Global Federation and Search

Twitter: @rbramleyBlog: http://leanjavaengineering.com/!

Web: http://www.ixxus.com !