enterprise search-sizing-ha-and-migration-path
DESCRIPTION
TRANSCRIPT
Enterprise Search Sizing, HA, and Migration
Path Hosted by:
Vikram Rajkondawar
Architect Advisor
Microsoft Corporation
Presented by:
Ashvini Shahane (Head Strategic Service Unit - Synergetics)
Discussion Points
• SharePoint 2010 Search/FAST Search
– Capabilities
– Architecture
– Search First Migration
– High Availability and Sizing considerations
• Migration options for migrating MOSS 2007 to SPS 2010
SharePoint 2010 Search
Enterprise Search Product Portfolio
SharePoint Server for Internet Sites
FAST Search for SharePoint Internet Sites
SharePoint Server
FAST Search for SharePoint
FAST Search for Internal Applications
FAST Search For Internet Business
Solutions for Internet Business
Solutions for Business Productivity
Integrated with
SharePoint
Stand-alone
Entry-LevelSolutions
Search Server
Search Server Express
SHAREPOINT SEARCH: CAPABILITIES
End-User UI• Out-of-box refinement
– Refine over key results properties
– Metadata, taxonomy and social tags based results refinement
– Easy to extend over custom properties
• One-stop Search Center– Scopes, web parts, best bets, top answers ,
advanced search
– Query federation brings together results from all over - native support for OpenSearch
• Core search experience– Improved did you mean suggestions
– New pre-query and post related query suggestions
– “View in browser” link (for most office docs)
– Improved query syntax
End-User UI
New Query Syntax
• Support for Boolean operators for FreeTextqueries and Property queries– (“SharePoint Search” OR “Live Search”) AND
(title:“Keyword Syntax” OR title:”Query Syntax”)
• Prefix matching support for keywords and properties– Micro* author:bill*
• Improved operator support for property restrictions– =, >, <, <=, >=
– Can create range refinements
Great Search Experience OOB
Refinement
panel
Related
searches
Federated
results
Get more relevant
resultsthrough a search center with
hit highlighting, results
summaries, related queries,
and enhanced query syntax
Search from anywhereIncluding mobile and desktop
integration; Office Web Apps
speed access to results;
enhancements for multi-lingual
Find information
fasterwith metadata-driven
refinement, query suggestions,
search scopes, and federated
results which help pinpoint
information
Win7
Connector
Launch in Office
Web Apps
Search is Social
• People finding experience
– Front door to the office social network
– Better expertise & interest search
• Email mining to bootstrap profiles with interests and colleagues
– “Address book style” search
• Phonetic name matching
• Nickname matching
– Relevance models tuned specifically for people search
– Metadata refinement, better hit highlighting, recently authored content
Search is Social
• Social behavior drives search quality
– Search click through behavior drives relevance ranking
– Query suggestions mined from search logs
– Social tagging influences relevance ranking
– Self search - to drive people to participate content
– Social definitions extracted from indexed content
Amplify the Impact of Knowledge & Expertise
Connect with expertiseusing improved matching from
mined Outlook mailbox data and
SharePoint My Site profiles
Improve relevance
with usebased on how people tag content
in SharePoint and on click-
through of search results
Find peoplethrough nickname and phonetic
matching, people specific
refinement, tuned relevance
models
Phonetic and
nickname matching
Expertise
identificationRecently
authored content
Refine by focus,
expertise, and
other attributes
Search Use in Social Data Delivery
• Search is used for data retrieval and trimming in other SharePoint social features
Feature Action Query
My Site Host home page
What’s New web part Retrieves up to 40 recent activities from colleagues
Profile Page (person.aspx)
Recent Activities web part Retrieves up to 10 recent activities for user
Tags and Notes page
Activities for Month web part Retrieves up to 40 tags or notes based on activities for the specified month for user
Outlook Social Connector
OSC synchs every hour for every user. The response sends updates for colleagues since the last time OSC synched
Retrieves all recent (since the last synch) activities from colleagues
Search Depends on Social
• Some of the functionality in Search also depends on data from Social
• Only difference between SS and FS for social FS doesn’t index social tags
Feature SS FS
Core Results Page showing social tags (up to 5) for search results
Core Results Page Refinement by social tags
Core Results Page Refinement by Taxonomy data / Authoritative tags
All features on the people search tab - searching for people, searching for expertise, refining by people properties etc.
FAST SEARCH: CAPABILITIES
Go Beyond the Search Box
Thumbnails
Sorting on any
property
Visual Best Bets
Scrolling PowerPoint
Previews
Refinement with
counts on any
property
Go Beyond the Search Box
• Site admin/Search admin control• Visual Best Bets
• Promote/Demote documents and sites
• UI extensibility (web parts, ..)
• Relevancy profiles and parameters
• User Context parameter & admin
• End User Control• Sorting, Ranking, and Navigation
• Admin-enabled controls
• Linguistics and term control• Keywords, phrases, synonyms, spellcheck
• Multilingual searching control
• Lists for metadata extraction
• Search similar (based on document vectors)
• Index based did you mean suggestions
User Context MattersAlan Brewer, Sales
What should I know about selling ERP consulting?
Renee Lo, Engineer
What should I know about implementing ERP?
Go Beyond the Search Box
• Can search in any language
• 84 languages detected to allow language-specific handling
• Lemmatization improves recall
(‘better’ includes ’good’)
• Phrase search includes stopwords
(“a room with a view”)
– Only nouns and adjectives are expanded (higher precision)
(‘book’ -> ‘books’, not ‘booked’)
Afrikaans Hausa Pashto, Pushto
Albanian Hebrew Persian
Arabic Hindi Polish
Armenian Hungarian Portuguese
Azerbaijani Icelandic Punjabi
Basque Indonesian Rhaeto-Romance
Bengali,Bangla Irish Romanian
Bosnian Italian Russian
Breton Japanese Sami (Northern)
Bulgarian Kannada Serbian
Catalan Kazakh Slovak
Chinese-S Kirghiz Slovenian
Chinese-T Korean Sorbian
Croatian, Kurdish Spanish
Czech Latin Swahili
Danish Latvian, Lettish Swedish
Dutch Letzeburgesch Tagalog
English Lithuanian Tamil
Estonian Macedonian Telugu
Faroese Malay Thai
Finnish Malayalam Turkish
French Maltese Ukrainian
Galician Maori Urdu
Georgian Marathi Uzbek
German Mongolian Vietnamese
Greek Norwegian Welsh
Greenlandic Norwegian-B Yiddish
Gujarati Norwegian-N Zulu
Advanced Content Processing
PRODUCT (Custom)
CONCEPT (Custom)
COMPANY (OOTB)
SHAREPOINT SEARCH:ARCHITECTURE
Architecture and Design
• Deployment and management
• Scale-Out architecture– Introduction to concepts
– Scale-out features and options
• Other engine enhancements
Query Object Model
Content Sources - Host the content we want to return in main results ContentContentContent
OpenSearchSource
Crawling -Traverse URL space to record items in searchcatalog
Indexing - Extract information from items to enable efficient matching
Query Servers - Accept query requests from users and returnresults
Search Center - UI for users to issue queries and interact with results
Query Federation - Return results from non-SharePoint Indexes
Crawler
Indexer
Query ServersIndex
Partition
Connectors -Know how to processdifferent content sources
Index Partition - Subset of the overall index
MOSS 2007 search scale-out
Query
Query
Indexer“Single point of
failure”
“Bottleneck”
“The whole index”
“Bottleneck”
SharePoint Search 2010 Scale-out
QueryQuery
Indexer
“Single point of failure”
“Bottleneck”
“The whole index”
“Bottleneck”
IndexerCrawler Crawler
Crawler Crawler
Query Query
Multiple Index Partitions
Crawl Distribution
Query Mirroring
Query Components
Stateless Crawlers
Multiple Property DBs
Admin
ComponentAdmin
Database
Admin Database +Admin Component
Search First Migration
• Begin Migrating MOSS 2007 with SharePoint 2010 Search– Good approach for most cases
• User’s content kept in MOSS but User search queries handled by SharePoint 2010
• Can Be SharePoint Search or FAST Search Server 2010 for SharePoint
– Flexible approach• Can add other services later or as needed
• Can Migrate Content later or in Parallel
– Can be implemented easily
Search First
Indexing MOSS 2007 User Store
• Create a Content Source– Content Source Type - SharePoint Sites
– Start Address: sps3://<MOSS 2007 Site>
– Search Results from that source - not all options will be available• No Add as a colleague
• No Browse in Organization Chart
User Profile Replication Engine
• UPRE ships in SPS2010 Admin Toolkit– Sync between MOSS 2007 and SPS2010
• Co-existence
– Sync between SPS2010 and SPS2010• User Profile SA can’t be used across the WAN
• Includes social data
From MOSS 2007
Local to SP 2010
High Availability / Fault Tolerance
A design that enables a system to continue operation, possibly at a reduced level (also known as graceful degradation), rather than failing completely, when some part of the system fails.
“Fault tolerant design”, Wikipedia
High Availability for Search
• Content side High Availability– Full redundancy in the feeding chain
– Normally not critical for intranet applications
– Preferred by many clients
• Query side High Availability– Full redundancy of all query components
– Critical for internet facing applications
– Preferred for intranet applications
• Backup/recovery alternatives not covered
SharePoint Search – Content Data Flow
Crawl DB
Request crawl
Poll request
Log request
Poll request
Distribute request
Doc. properties Index fragments
Securitydescriptors
(ACLs and ACEs)
Crawl DB
SharePoint Search – Content Side HA
Crawl DB
Property DB
Automatic re-election of Master Redundant instances
will automatically fail over
No redundancy support,but can be quickly relocated via PowerShell
Crawlers are stateless, automatic failover
SharePoint Search – Query Data Flow
SharePoint Search – Query HA
The cost of overinvestment in hardware is almost always far less than the cumulative expenses related to troubleshooting problems cause by under sizing.
TechNet, Capacity management and sizing for Sharepoint 2010
Search Sizing
• Scale up(Add more hardware:processors/memory)
• Scale out(Add more servers to a farm)
• Search is by far the service application in SP 2010 with the largest hardware utilization
Sizing approach
Crawl DB instances
Crawler components / Indexers
Index partitions PropertyDB instances
Sizing exercise
18
SP Search – Pilot/Dev Deployment
SP2010 Farm
All roles
SP Search – Extra Small DeploymentSP2010 Farm
SQL 2008 Cluster
All DBs
All roles
SP2010 Farm
Web Front End
Query
SP Crawl
People Crawl
SQL Server
Web Front End
Query
SP Crawl
People Crawl
SQL Server
SP Search – Small DeploymentSP2010 Farm
SQL 2008 Cluster
Web Front End
Query
Index partition 1
Web Front End
Query
Index partition 1
Search Admin DB
Crawl DB
Property DB
SharePoint DB
Central Admin
SP Crawl
People Crawl
SP Crawl
People Crawl
Note:
Servers marked with * are only
needed for high availability
*
*
SP Search – Medium DeploymentSP2010 Farm
SQL 2008 Cluster
Query
Index partition 1
Index partition 4
Query
Index partition 1
Index partition 2
Search Admin DB
Property DB
SharePoint DB
Central Admin
SP Crawl
People Crawl
SP Crawl
People Crawl
Crawl DB
Query
Index partition 2
Index partition 3
Query
Index partition 3
Index partition 4
Web Front End Web Front End
SP Search – Large DeploymentSP2010 Farm
SQL 2008 Cluster
Query
Index partition 1
Index partition 10
Query
Index partition 1
Index partition 2
Crawl DB
SharePoint
Central Admin
SP Crawl
People Crawl
SP Crawl
People Crawl
Crawl DB
Query
Index partition 2
Index partition 3
Query
Index partition 3
Index partition 4
Property DB
Search Admin DB
Query
Index partition 4
Index partition 5
Query
Index partition 5
Index partition 6
Query
Index partition 6
Index partition 7
Query
Index partition 7
Index partition 8
Query
Index partition 8
Index partition 9
Query
Index partition 9
Index partition 10
Property DB
Web Front End Web Front End
SP Crawl
People Crawl
Server Calculation Matrix
Disclaimer:The numbers might not be representative for the customer environment and data. Please use caution when using these numbers for sizing.
NameItem count WFEs
Query Comps
Crawl Comp
Prop DBs Crawl DBs Total
Content Side HA
Query Side HA
Single VM (Lab + min production) 1 (shared) (shared) 1 (shared) (shared) 1 (x) (x)
Extra Small 5 (shared) (shared) 1 1 (shared) 2
Small 10 2 (shared) 1 1 (shared) 4 x
Medium 40 2 4 2 1 1 10 x x
Large 100 2 10 3 2 2 19 x x
FAST Search for SharePoint 2010Query
completion
Document thumbnails
Scrolling previews
Read in Office Web Apps
Relatedsearches &
people
Federated results
Sorting on any property
FAST Search – Content Data Flow (1/2)
Admin component
Admin DB
Master Crawl comp.
Crawl comp.
Crawl DB
Property DBQuery
component
Request crawl
Poll request
Log request
Poll request
Distribute request
Doc. properties Index fragments
Crawl dataCrawl historyCrawl queue additions
Securitydescriptors
(ACLs and ACEs)
Crawl DB
Admin DB
FAST Search – Content Side HA (1/2)
Admin component
Admin DB
Master Crawl comp.
Crawl comp.
Crawl DB
Property DBQuery
componentProperty DB Crawl comp.Crawl comp.
Automatic re-election of Master
Query componentQuery
component
Redundant instanceswill automatically fail over
No redundancy support,but can be quickly relocated via PowerShell
Crawlers are stateless, automatic failover
Content Distributor
Indexing Dispatcher
FAST Search – Content Data Flow (2/2)
Item Processing
Indexing
Search
Crawled batch
Pass on batch
Ready to index
Pass on batch
Distribute index
Link Analysis(Web Analyzer)
Detected links
Search
Indexing
Indexing Dispatcher
Indexing Dispatcher
Item Processing
Item Processing
Content Distributor
Content Distributor
Content Distributor
Indexing Dispatcher
FAST Search – Content Side HA (2/2)
Item Processing
Indexing
Search
Link Analysis(Web Analyzer)
Does not hold state, automatic failover
Does not hold state, automatic failover
Does not hold state, automatic failover
Backup indexer,manual failover
Search rows haveautomatic failover
Must be set up for redundancy.Disk errors mayrequire manual recovery.
Crawl DB and Crawl Component requirements are as for SharePoint Search
FAST Search – Query Side HA
FAST Search for SharePointSummary of architectural elements
Query and Result
Processing
SharePoint Front-end
ContentProcessing
And Linguistics
Microsoft System Center Operations Manager
Connectors:- SharePoint- BDC- Exchange
Site Collection Level Admin UI- Keyword Management- User Context Management- Site Promotion/Demotion
Central Administration UI - Property mapping- Entity extracton- Spell-checking
PowerShell- Schema configuration- Admin configuration- Deployment configuration
Search
IndexingSecurity Access
Module
Connectors:- Web Crawler- JDBC- Lotus Notes
Content
Content
Content
Qu
ery
Ob
ject
Mo
del
Monitoring Services
Administration and Schema Object Model
Federation Object Model
Query Web Service
OpenSearch or other Sources
FAST Search for SharePoint
Search Service Applications
Custom front-end
Web Frontend
!!
People Search
Content Processing Flow
• Data moves from content source to end user queriesIt gets crawled, processed and refined, an index is created
User executes queries and retrieves data, metadata, and federated search results
End Users Content
Fed
erat
ion
OpenSearchSource
Content Processor
CrawlerIndexerQuery
Processor
Search Center Profiles
MetadataRelevanceControl
UserContext
IndexingConnectivity
Index Partition
Content Pipeline Stages
• Format Conversion
• Language detection and encoding
• Lemmatizer
– Linguistics normalization
• Tokenizer
– Word breaking
• Entity Extraction
– Persons, companies, locations, email, date/time, URL, prices, file names
• DateTimeNormalizer
– Date normalization
• Vectorizer
– Create document vector for similarity searching
• WebAnalyzer
– Anchor text and link cardinality analysis
• PropertiesMapper
– Map to crawled properties
• PropertiesReporter
– Report detected properties
Default OptionalXML Properties mapper
Offensive Content Filter
Verbatim extractor
Loads dictionary for custom extraction, e.g product names
Field Collapsing
…
FAST Search for SharePoint Scaleout
Scale-out in different “dimensions”
Query Volume
Content Volume
Processing power
Indexing freshness
Redundancy options
Search
Indexing
Performance targets*
30 mDocs/node
50 QPS/node
35 docs/sec
* Dependent on document and HW characteristics
FAST Search – Disk CalculationMax item count
(in Millions) Adm Web Analyzer Crawl DB Server Indexer Indexer (HD)
1 1 x 72 GB 1 x 5 GB 1 x 10 GB 1 x 120 GB 1 x 120 GB
10 1 x 72 GB 1 x 50 GB 1 x 40 GB 1 x 1.2 TB 1 x 1.2 TB
40 1 x 72 GB 1 x 60 GB 1 x 150 GB 3 x 2.0 TB 1 x 4.8 TB
100 1 x 72 GB 2 x 75 GB 1 x 350 GB 6 x 2.0 TB 3 x 4.8 TB
150 1 x 72 GB 4 x 75 GB 1 x 500 GB 10 x 2.0 TB 4 x 4.8 TB
200 1 x 72 GB 5 x 75 GB 2 x 350 GB 14 x 2.0 TB 5 x 4.8 TB
500 1 x 72 GB 9 x 75 GB 2 x 500 GB 34 x 2.0 TB 13 x 4.8 TB
SharePoint Search/FAST Search Recap
• Search is the most demanding service in SP 2010 –plan accordingly
• All components involved in querying and steady-state crawling support HA
• High Density mode may be an attractive alternative
• Sizing models are based on thorough testing – find one that fits your scenario
Migration and upgrade paths from MOSS 2007
2010 Upgrade improvements
• Detect issues early
– Provide O12 tools to admins
– Report critical issues at start of upgrade
• Keep the administrator informed
• No data loss
– Keep content and settings
• Continue when possible
• Be reentrant
– Upgrade should not be catch 22
New
• Upgrade Preparation Tools
• Windows PowerShell Upgrade Cmdlets
• Feature Upgrade
• Visual Upgrade
2010 Upgrade Overview
Changed
• Upgrade Methods
Improved
• Upgrade Status Reporting
• Upgrade Logging
Removed
• Gradual Upgrade
• Side By Side Installation
2010 Upgrade Scenarios and Methods
Supported Scenarios Unsupported Scenarios
• In-Place Upgrade
• Database Attach Upgrade:
– Content Database
– Profile Service Database
• Upgrade from earlier than WSS v3
SP2/MOSS 2007 SP2
• Direct upgrade from WSS v2/SPS
2003 or earlier
• Side by side installation
• Gradual upgrade
In-Place
• Next, next, finished
• Advancements
– Restartable!
– Common blocking time outs removed
In-Place Pros/Cons
Farm wide settings are preserved and upgraded
Customizations are available in the environment after the upgrade if they are v4 compatible
Servers and farms areoffline while the upgradeis in progress
The upgrade proceeds continuously
Existing v3 farm must support (64 bit and performance
Supported Paths In-Place
x86MSS 2010
x86 x86MSS 2010
x86WSS v3.0
SP22010
Database Attach
• Databases that can be attached
– Content database
– Profile service database
– Project service database
• V3 databases that cannot be attached
– Configuration
– Search
• Backup 2007 Content DB
• Restore to SharePoint 2010 SQL Server, using SQL Tools
• Test-SPContentDatabase –name wss_content_2007 –webapplicationhttp://2010webapp
• Mount-SPContentDatabase –name wss_content_2007 –webapplicationhttp://2010webapp
Database Attach Steps
DB Attach Pros/Cons
Upgrade multiple content databases at the same time
Combine multiple farmsinto one farm
Customizations must be transferred manually
Pros
The server and farm settings are not upgraded
Customizations must be transferred manually
Missing customizations
Cons
Hybrid Approach
• Detach DBs
• Upgrade to 2010 in-place
• DB Attach content DBs
Hybrid Pros/Cons
Farm wide settings preserved
Customizations alreadyin place
Multiple content databasesat the same time
Non-upgraded sites(in read-only mode) whileyou upgrade the content
Labor intensive
Direct access to the database servers
x86 is a lot of work
Existing hardwaremay need replacing
Upgrading FBA Web Apps
• Convert Web applications to claims-based authentication
• Update web.config with necessary connection information for your provider
• Use PowerShell to migrate users and permissions
SSP exploded to service applications –Inplace
SSP
• O12 SSPs and service settings =Flexible shared services model
• Service Applications = part of Foundation
• Notification of new services afterin-place upgrade
• Backup/restore of individual services+ Provisioning offbox
What is “Visual Upgrade”
• A feature that separates data upgradefrom UI upgrade– Data and code upgrade happens all at once
– Site UI has two modes: this version andprevious version
– Pages and components make the decisionat runtime, and it’s safe by default
Summary
• SharePoint 2010 Search/FAST Search
– Capabilities
– Architecture
– Search First Migration
– High Availability and Sizing considerations
• Migration options for migrating MOSS 2007 to SPS 2010
THANK YOU