bp-8 global federation and search
Post on 05-Jul-2015
630 Views
Preview:
DESCRIPTION
TRANSCRIPT
Global Federation and Search!
Robin Bramley, Ixxus!
Agenda!
• Who I am!• Setting the scene!• The business challenge!• Alfresco!• Solr!• Big Content!• Global considerations!
• Scaling strategies!• Alfresco 4!• Federation approaches!• ʻIntelligentʼ storage!• Challenges!
My Background!
• Senior Architect @ Ixxus!• The UK Alfresco Platinum Partner!• Lucid Imagination partner!
• Worked at consultancies for 13 years!• Developing solutions with Alfresco since 0.6!• First UK Alfresco Gold partner!
• Around the edges I also write!• GroovyMag author – inc. 4 hands-on Grails articles!• DZone Most Valuable Blogger!
• Re-published posts include Event Driven indexing with Solr!• Open source contributions include!
• OpenID support for Acegi / Spring Security!• Codenarc support for Hudson / Jenkins CI Violations plugin!
The challenge!
• Many global organisations face similar challenges around sharing information in a timely fashion between regions.
• For publishers this is often exacerbated due to the size of some their assets such as print quality images or video.
Alfresco!
Hopefully this needs little introduction. • Clue: itʼs an ECM!
Apache Solr!
RESTful Search Service • POST it documents!• GET query results!
• Built on top of Lucene!• Originated from CNET (created by Yonik Seeley)!• Features !
• Schema!• Request handlers!• Query types!• Response Writers!• Admin pages!• Replication!• Sharding!
• Professional support available from Lucid Imagination!
Big Content
Going global!
Going global!
Global systems can pose additional challenges
• Infrastructure • Network!
• Bandwidth!• Latency!• Reliability!
• Languages • Timezones • Collaboration • Workflow • Security permissions
Scaling strategies!
You can scale / divide & conquer systems in a number of ways:
• Scale up (vertical)
Scaling strategies!
• Scale out (horizontal)
• Typically clustering!
• But could also be!
• Replication!
• Separation of responsibilities!
Scaling strategies!
• Partitioning
• Data Sharding!
• Silos !• Divisional / departmental!• Regional!
Alfresco 4!
What’s new in Alfresco 4.0? • Wonʼt repeat the full press release here…!• ʻCloud-scale performanceʼ!
• Alfresco Index Server based on Apache Solr!• Enhanced clustering!
• Based on Solr 1.4.1!• Uses a custom alfrescoDataType fieldType!• Leverages dynamic schema fields heavily!
• Only statically defined field is ʻidʼ!• Everything else (*) is a multi-valued dynamic field!
• Though it uses the Alfresco model dictionary under the hood!• Analysis chain (same for index/query)!
• Whitespace tokenized !• Word Delimited!
• Breaks up camelCase etc.!• Converted to lower case!
• Adds a cmis request handler!• Uses SSL client certificate authentication!
Alfresco 4 Solr!
Federating!
Federation Approaches!
Pros • Can index many different
data sources!• File systems!• Databases!
Cons • Timeliness!• Pull model not suitable for all
scenarios!• Additional storage
requirements!• Indexing can be inefficient in
a global scenario!• Permissions!
Build an index with a crawler
Federation Approaches!
Federated Search using OpenSearch • A collection of simple formats for sharing
search results!• Can use an Atom response format!• Elements such as totalResults used in
CMIS Atom binding!• Was a big deal in Alfresco 2.0 (2007)!
• Alfresco Explorer has an OpenSearch client!• Alfresco has an OpenSearch server!
• Provided keyword search!• Wiki stated: ʻNote: Advanced Web Client Search and
Query Language searches will be OpenSearch enabled some time in the future, probably in line with up-and-coming CM standards.ʼ!
• Client not in Share!• CMIS a better bet for complex queries!
Federation Approaches!
Pros • Can work across
heterogeneous search engines!
• Can implement asynchronous results!
Cons • Rebuilding the wheel?!• Authentication is a challenge
(without SAML or OAuth) !
Build a meta-search service
Federation Approaches!Solr shards • Treat separate Alfresco repositories Solr cores as separate shards!
Pros • Distributed queries are a
standard Solr feature!
Cons • The repositories need to be
backed by a single authentication source!
• E.g. LDAP!• Asynchronous results arenʼt
supported OOTB!
ʻIntelligentʼ storage!
Storage Cloud Technology • Underpinning for the repository is a storage cloud technology!
• Uses a Content Store Selector!• Base layer built on commodity hardware!
• Keeps multiple replicas of the content!• Management layer !
• Cost-based routing!• Knows where content resides!
• On-demand content migration between repositories!
Challenges!
• Large file size • Has to work with streaming!• Beware of anything that attempts to buffer a full file into memory!
• E.g. to POST it!• Watch out for processes that need to copy a file!
• User expectations • Need training on asynchronous behaviour!• Search results and their appearance!
• Grouping / sort!• Pagination (of distinct result sets)!
• Time to migrate large content!• Can be lengthy if there isnʼt a ʻnearʼ copy!
Twitter: @rbramleyBlog: http://leanjavaengineering.com/!
Web: http://www.ixxus.com !
top related