#spsnh #spsnj search topology & optimization

49
Mike Maadarani SharePoint Architect 9/20/2014 Search Topology & Optimization

Upload: mike-maadarani

Post on 20-Dec-2014

281 views

Category:

Technology


3 download

DESCRIPTION

This presentation explains the details of all search components, how to properly configure your search topology, and your options to extend your search farm in a hybrid “cloud/on-prem” scenario. You will learn what you need to consider to design your search, in order to handle your organization's needs. We will dive into scripting a high availability search topology, keeping it healthy and manage your day-to-day search operations. Learn about how to optimize your search for best performance and search relevancy, to support reliable search applications. Together, we will review where Search lives in the farm, the crawl components of search to implement a scalable farm.

TRANSCRIPT

Page 1: #SPSNH #SPSNJ search topology & optimization

Mike Maadarani

SharePoint Architect9/20/2014

Search Topology & Optimization

Page 2: #SPSNH #SPSNJ search topology & optimization

Thank You Event Sponsors

• Diamond & Platinum sponsors have tables here in the Fireside Lounge

• Please visit them and inquire about their products & services

• Also to be eligible for prizes make sure to get your bingo card stamped

Page 3: #SPSNH #SPSNJ search topology & optimization

New Jersey SharePoint user group

Different SharePoint discussions each month on various topics. Announced on meetup.com

Meets 4th Tuesday of every month 6pm – 8pm Microsoft Office (MetroPark) 101 Wood Ave, Iselin, NJ 08830 http://www.njspug.com

Page 4: #SPSNH #SPSNJ search topology & optimization

Bio..Mike Maadarani, Ottawa

App Dev and Architecture for over 19 years (16 Years Microsoft, 3 Years with the “Other Guys”)

Business focused on Enterprise Content Management, Publishing Sites, & Search Technology focused on SharePoint, SQL Server and SharePoint Integration Architect, trainer, and presenter Blog: www.maadarani.com [email protected]; @mikemaadarani

Page 5: #SPSNH #SPSNJ search topology & optimization

Configuring SSA and PS

Topology Scenarios

Agenda

Closing and Q&A

Relevancy, Query Builder, &Optimization

SharePoint 2013 Search Overview

Architecture and Resource Utilization

Hybrid… Say What?

Page 6: #SPSNH #SPSNJ search topology & optimization

Search in 2010

Crawl Component

Query Component

SharePoint 2010 Search Service Application

Crawl Indexing Engine

Query Engine

Search Admin

Property Store (SQL)

Content

UserWFE

Page 7: #SPSNH #SPSNJ search topology & optimization

FAST Search for SharePoint 2010

FAST Content SSA

FAST Query SSA

FAST back-end components(managed separately)

Extensibility:• Sandbox• Entity

Extraction

Crawl Indexing Engine

Query Engine

Content Pipeline

Analysis Engine

Query Pipeline

Search AdminContent

UserWFE

Page 8: #SPSNH #SPSNJ search topology & optimization

… In SharePoint 2013

SharePoint 2013 Search Service Application

Index Component

Query Engine

Content Pipeline

Content ProcessingComponent

CrawlComponent

Query ProcessingComponent

AnalyticsProcessingComponent

Query Pipeline

Search Admin

Admin Component

Entire index on local disk

Property Store (SQL)

Content

UserWFE

Analysis Engine

Crawl Indexing Engine

Link/query analysis & recommendations

Separate crawl and indexing

Extensibility:• Web

callout• Entity

Extraction

Page 9: #SPSNH #SPSNJ search topology & optimization

SharePoint 2013 Search Architecture

SearchAdmin

Content UserCrawlContentProcessing Index

QueryProcessing WFE

API

AnalyticsProcessing

Crawl

Search Admin

Link

Analytics Reporting

FAST Search Index

SharePointSP AppsDevicesNon-SP UX

HTTPFile sharesSharePointUser profilesLotus Notes DocumentumExchange foldersCustom - BCS

Public APISearch topology components

Content Query

Page 10: #SPSNH #SPSNJ search topology & optimization

Why Search is so important?

I just uploaded a document. Make it searchable, quick!

FAST

Page 11: #SPSNH #SPSNJ search topology & optimization

Why Search is so important? EASY

Page 12: #SPSNH #SPSNJ search topology & optimization

Why Search is so important? EASY

Page 13: #SPSNH #SPSNJ search topology & optimization

Why Search is so important?

Search Driven Applications

Page 14: #SPSNH #SPSNJ search topology & optimization

noderunner.exe noderunner.exe noderunner.exe noderunner.exe

Where does Search live in the farm?Windows servicesSharePoint Search Host Controller service

Runtime/lifecycle control of search components (except crawler) hostcontrollerservice.exe

SharePoint Server Search service

Crawl Componentmssearch.exemssdmn.exe

ProcessesNoderunner.exe

Runtime environment for search components (except crawler)

msseearch.exemssdmn.exe

CrawlComponentnoderunner.exe

Search Runtime Environment

hostcontrollerservice.exe

Host Controller

Sh

are

Poin

t A

pp

Serv

er

Admin entitiesSearch Service Instance: Provisioning of the search service on each boxSearch Service Application: SharePoint Configuration entity

Still there, but only Crawl Component

AdminComponent

Query ProcessingComponent

Content ProcessingComponent

IndexComponent

Analytics ProcessingComponent

Page 15: #SPSNH #SPSNJ search topology & optimization

Where do I host my components?

Page 16: #SPSNH #SPSNJ search topology & optimization

Query processing component (QPC)

CPU load Driving factors

QPS

Query transformations

Network load Driving factors

Number of index partitions Size of queries and results

Example: 20 index partitions @ 20 qps => 200/100 Mbit/s

in/outbound

Item count

DPS QPS

Load impact (rel-ative)

CPU NetworkDisk

http://social.technet.microsoft.com/wiki/contents/articles/16002.sharepoint-2013-capacity-planning-sizing-and-high-availability-for-search-in-spc172.aspx

Page 17: #SPSNH #SPSNJ search topology & optimization

Index component

CPU load Driving factors

QPS and item count

Guidelines per index component @ 2 GHz CPU 1M items: 5 QPS per CPU core 5M items: 2 QPS per CPU core 10M items: 1 QPS per CPU core

Disk load Driving factors

QPS and item count

New content invalidates caches Disk size: 500GB @ 10M items per index component

Item count

DPS QPS

Load impact (rel-ative)

CPU NetworkDisk

Page 18: #SPSNH #SPSNJ search topology & optimization

Crawl component

CPU load Driving factors

Documents per second Link discovery Crawl management

Network load Driving factors

Downloading items from content sources Passing items on to CPC Crawl management

Disk load All documents are temporarily stored in data

folder

Item count

DPS QPS

Load impact (rel-ative)

CPU NetworkDisk

Page 19: #SPSNH #SPSNJ search topology & optimization

Content processing component (CPC)

CPU load Driving factors

Documents per second Document size and complexity Feature extraction

Estimate: 5-10 DPS per CPU core

Network load Driving factors

Documents per second Document size

Item count

DPS QPS

Load impact (rel-ative)

CPU NetworkDisk

Page 20: #SPSNH #SPSNJ search topology & optimization

Analytics processing component (APC)

CPU load Driving factors

Number of items Site activity

Disk load Local disk used for temporary storage Bulk load, primacy concern is load isolation

Network load Same as for CPU load PLUS: Network traffic increases when distributing

APC across multiple machines

Item count

DPS QPS

Load impact (rel-ative)

CPU NetworkDisk

Page 21: #SPSNH #SPSNJ search topology & optimization

Search administration component Low CPU and network load

Load increase with more components in the search topology

Item count

DPS QPS

Load impact (rel-ative)

CPU NetworkDisk

Page 22: #SPSNH #SPSNJ search topology & optimization

Create your SSA#5. Provision Search Admin ComponentSet-SPEnterpriseSearchAdministrationComponent -searchapplication $SearchApp -searchserviceinstance $SSI

#6. Create the topology$Topology = New-SPEnterpriseSearchTopology -SearchApplication $SearchApp

#7. Assign server(s) to the topology$hostApp1 = Get-SPEnterpriseSearchServiceInstance -Identity "SPWFE"New-SPEnterpriseSearchAdminComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1New-SPEnterpriseSearchCrawlComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1New-SPEnterpriseSearchAnalyticsProcessingComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1New-SPEnterpriseSearchQueryProcessingComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1New-SPEnterpriseSearchIndexComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1 –IndexPartition 0

#8. Create the topology$Topology | Set-SPEnterpriseSearchTopology

$SSADB = "SharePoint_SPSNJ_Search"$SSAName = "Search Service Application SPS NJ"$SVCAcct = "mcm\sp_search"$SSI = get-spenterprisesearchserviceinstance -local

#1. Start the search services for SSIStart-SPEnterpriseSearchServiceInstance -Identity $SSI

#2. Create the Application Pool$AppPool = new-SPServiceApplicationPool -name $SSAName"-AppPool" -account $SVCAcct

#3. Create the search application and set it to a variable$SearchApp = New-SPEnterpriseSearchServiceApplication -Name $SSAName -applicationpool $AppPool -databaseserver SQL2012 -databasename $SSADB

#4. Create search service application proxy$SSAProxy = new-SPEnterpriseSearchServiceApplicationProxy -name $SSAName" Application Proxy" -Uri $SearchApp.Uri.AbsoluteURI

Page 23: #SPSNH #SPSNJ search topology & optimization

Small Search Topology

Page 24: #SPSNH #SPSNJ search topology & optimization

Fault tolerant small search topology

Host

VM

Index QPC

VM

Admin

Crawl

CPC

APC

Host

VM

Index QPC

VM

Admin

Crawl

CPC

APC

Page 25: #SPSNH #SPSNJ search topology & optimization

Other SharePoint applications

Web front end

Admin

Crawl

CPC

APC

Index

QPC

Small search farm (up to 10M items)

Resources @ 10M items8x CPU cores24 GB RAM800 GB disk

Sized independently

Separate disk

for index

Page 26: #SPSNH #SPSNJ search topology & optimization

Scaling from small to medium search topology

Adm

Crawl

Index Index IndexIndex QPCCPC CPC

APC

Adm

Crawl

Index Index Index IndexQPCCPC CPC

APC

Page 27: #SPSNH #SPSNJ search topology & optimization

Extend your SSANew-SPEnterpriseSearchAnalyticsProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1New-SPEnterpriseSearchQueryProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1New-SPEnterpriseSearchIndexComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1 –IndexPartition 0New-SPEnterpriseSearchAdminComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2New-SPEnterpriseSearchCrawlComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2New-SPEnterpriseSearchAnalyticsProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2New-SPEnterpriseSearchQueryProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2New-SPEnterpriseSearchIndexComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2 –IndexPartition 1

#5. Activate the topology:Set-SPEnterpriseSearchTopology -Identity $newTopology

#2. Extend the Search Topology:$hostApp1 = Get-SPEnterpriseSearchServiceInstance -Identity "SPWFE"$hostApp2 = Get-SPEnterpriseSearchServiceInstance -Identity "SPSearch"Start-SPEnterpriseSearchServiceInstance -Identity $hostApp1Start-SPEnterpriseSearchServiceInstance -Identity $hostApp2

#3. Keep running this command until the Status is Online:Get-SPEnterpriseSearchServiceInstance -Identity $hostApp1 Get-SPEnterpriseSearchServiceInstance -Identity $hostApp2#4. Once the status is online, you can proceed with the following commands:$ssa = Get-SPEnterpriseSearchServiceApplication$active = Get-SPEnterpriseSearchTopology -SearchApplication $ssa –Active$newTopology = New-SPEnterpriseSearchTopology -SearchApplication $ssaNew-SPEnterpriseSearchAdminComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1New-SPEnterpriseSearchCrawlComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1

Page 28: #SPSNH #SPSNJ search topology & optimization

Medium Search Topology

Page 29: #SPSNH #SPSNJ search topology & optimization

Hybrid Search

Page 30: #SPSNH #SPSNJ search topology & optimization

Why Hybrid Search?Hybrid SharePoint environment

Pieces of content distributed across multiple environments

Complexity due to multiple locations

No single Enterprise Search Center for finding content

Lost user productivity and added frustration while trying to locate relevant content

Page 31: #SPSNH #SPSNJ search topology & optimization

BenefitsProvide integrated search results allowing for a

single place to find contentOne Enterprise Search center to reduce User

Interface complexityQuery all of your SharePoint content at the same

timeAllow O365 and On-Premises solutions to coexistProvides a solution allowing customers to move to

the cloud on their own termsReduce operation costTake advantage of newer SharePoint feature

updates in O365Hybrid search solves many problems as data is

moving from on-premises to O365

Page 32: #SPSNH #SPSNJ search topology & optimization

One-way outbound topology

WFE

SharePoint Online

Local search

results only

Site collection

Office365 tenant SharePoint Server 2013 Farm

Hybrid search results

Outbound

Inbound

SharePoint Online can NOT query SharePoint On-prem

Internet

Microsoft data center On-premises

SharePoint Server can query SharePoint Online

Page 33: #SPSNH #SPSNJ search topology & optimization

One-way inbound topology

WFE

SharePoint Online

Local search

results only

Site collection

Office365 tenant SharePoint Server 2013 Farm

Hybrid search results

Outbound

Inbound

SharePoint Online can query SharePoint On-prem

Internet

Microsoft data center On-premises

SharePoint Server can NOT querySharePoint Online

Reverse Proxy

DMZ

Page 34: #SPSNH #SPSNJ search topology & optimization

Inbound/Outbound Topology

WFE

SharePoint Online

Local search

results only

Site collection

Office365 tenant SharePoint Server 2013 Farm

Hybrid search results

Outbound

Inbound

SharePoint Online can query SharePoint On-prem

Internet

Microsoft data center On-premises

SharePoint Server can query SharePoint Online

Reverse Proxy

DMZ

Page 35: #SPSNH #SPSNJ search topology & optimization

Tweaking Your results

Page 36: #SPSNH #SPSNJ search topology & optimization

Challenges: Intent

Where is my talk Project Plan?

Are Documents held at the same place?

I wonder if there are references from

previous projects?Different people have different intents

Query Rules help you handle intents

There is rarely a single right answer

Infrastructure Project

Page 37: #SPSNH #SPSNJ search topology & optimization

Authorities: SSA-level configuration

Sites that are important

Sites with low intrinsic relevance

Takes ~24hrs to propagate

Page 38: #SPSNH #SPSNJ search topology & optimization

Authorities: Connected

Page 39: #SPSNH #SPSNJ search topology & optimization

Authorities: Connected

1

0

1

1

2

4

3

2

4

Setting an authority affects all sites connected through hyperlinks

Sites are weighted

by distance to the authority

Page 40: #SPSNH #SPSNJ search topology & optimization

Query RulesTune Search Results

Created at the SSA, Tenant, Site Collection or SiteSSA

Site Collection

Site

Page 41: #SPSNH #SPSNJ search topology & optimization

Query Rules

ConditionWhen Do I apply the rule?

ActionWhat to do when the rule is matched?

PublishingWhen should the rule be active?

Page 42: #SPSNH #SPSNJ search topology & optimization

Query Rules

Exact match, beginning or end Ad-hoc or term store dictionary Match a regex (advanced) Is this query more likely aimed at

the following source…? Do people mostly click on result of

the following type…?

Conditions Show a promoted result Show a block of results Replace the core results

with a different query

Actions

Page 43: #SPSNH #SPSNJ search topology & optimization

Query Builder

Dynamically Ranking Change

Part of the query

Results Ranking

Page 44: #SPSNH #SPSNJ search topology & optimization

Query Builder

Page 45: #SPSNH #SPSNJ search topology & optimization

Conceptual Flow

For all queries:

Authorities: Level 1: http://employment

Ranking model: {incorporate user ratings}

Query:HR Employmentquarterly report

Search Web Part

Query Processing Engine

Document Collection

Thesaurus: HR Human ResourcesBest bets: HR Employment /HR/employment

(WORDS HR, Human Resources) AND(WORDS employees, employed) AND (WORDS quarterly, quarterlies) AND(WORDS report, reports, reported)

Mixed Results for:• HR Employment best bet• HR Employment quarterly

report• HR Employment

ContentType=reports

Dynamic Reordering Rules: Quarterly Report {prefer docs from http://reports}

Query Rule: {Terms} Quarterly Report {Terms} ContentType=“reports”

Page 46: #SPSNH #SPSNJ search topology & optimization

Create a Query Rule – Hybrid From Result Source drop-down list, select the specified result source

Under Query is performed on these sources, if you select “One of these sources”, make sure to select the result source you created

Page 47: #SPSNH #SPSNJ search topology & optimization

Hybrid Results

Results from SharePoint Online

Results from SharePoint Server

Page 48: #SPSNH #SPSNJ search topology & optimization

Session Objective and Takeaways High Availability and Performance

Better Search Quality

Better management

Friendly results and tools

Page 49: #SPSNH #SPSNJ search topology & optimization

Thank You!

[email protected] , @mikemaadaraniwww.slideshare.net/maadarani