introduction - justportal.just.ro/test doclib... · web viewfast search server 2010 for sharepoint...

FAST Search Server 2010 for SharePoint Capacity Planning

This document is provided "as-is". Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it.Some examples depicted herein are provided for illustration only and are fictitious. No real association or connection is intended or should be inferred.This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.© 2010 Microsoft Corporation. All rights reserved.

FAST Search Server 2010 for SharePoint Capacity PlanningMicrosoft CorporationNovember 2010Applies to: FAST Search Server 2010 for SharePoint Summary: This document describes specific deployments of FAST Search Server 2010 for SharePoint, including:

Test environment specifications, such as hardware, farm topology and configuration; The workload used for data generation, including the number and class of users, and farm

usage characteristics; Test farm dataset, including search indexes and external data sources; Health and performance data specific to the tested environment; Test data and recommendations for how to determine the hardware, topology and

configuration you need to deploy a similar environment, and how to optimize your environment for appropriate capacity and performance characteristics.

Contents

Introduction..........................................................................................................................................................................4

Search Overview...................................................................................................................................................................5

Sizing Approach.............................................................................................................................................................5

Search and Indexing Lifecycle.......................................................................................................................................5

Content feeding......................................................................................................................................................5

Query load..............................................................................................................................................................7

Network traffic......................................................................................................................................................14

Web Analyzer performance dimensioning....................................................................................................................15

Scenarios............................................................................................................................................................................16

Shared specifications across all scenarios...................................................................................................................16

Query workload....................................................................................................................................................16

Notes on measured disk usage.............................................................................................................................17

Configuration for extended content capacity........................................................................................................18

Extra small FAST Search Farm......................................................................................................................................20

Deployment alternatives......................................................................................................................................21

Specifications.......................................................................................................................................................21

Test Results...........................................................................................................................................................26

Small FAST Search Farm..............................................................................................................................................30

Medium FAST Search Farm...........................................................................................................................................31


Specifications.......................................................................................................................................................31

Test Results...........................................................................................................................................................44

Large FAST Search Farm..............................................................................................................................................61


Specifications.......................................................................................................................................................61

Test Results...........................................................................................................................................................66

Extra-large FAST Search Farm......................................................................................................................................67

Overall Takeaways.......................................................................................................................................................68

Query and feeding performance...........................................................................................................................68

Redundancy..........................................................................................................................................................68

Capacity per node................................................................................................................................................68

Deployments on storage area networks (SAN).....................................................................................................68

Deployments on solid state disks (SSD)...............................................................................................................69

Troubleshooting performance and scalability......................................................................................................................70

Raw I/O performance...................................................................................................................................................70

Analyzing feeding and indexing performance..............................................................................................................72

Content SSA..........................................................................................................................................................72

Content distributors and item processing.............................................................................................................75

Indexing dispatcher and indexers.........................................................................................................................76

Analyzing query performance......................................................................................................................................77

Query SSA............................................................................................................................................................77

QRproxy and QRserver.........................................................................................................................................78

Query dispatcher..................................................................................................................................................79

Query matching....................................................................................................................................................79

IntroductionThis document provides capacity planning information for collaboration environment deployments of FAST Search Server 2010 for SharePoint, in the following referred to as FAST Search Server. It includes the following information for sample search farm configurations:

Test environment specifications, such as hardware, farm topology and configuration The workload used for data generation, including the number and class of users and farm

usage characteristics Test farm dataset, including search indexes and external data sources Health and performance data specific to the tested environment

It also contains common test data and recommendations for how to determine the hardware, topology and configuration you need to deploy a similar environment, and how to optimize your environment for appropriate capacity and performance characteristics.FAST Search Server contains a richer set of features and a more flexible topology model than the search solution in earlier versions of SharePoint. Before you employ this architecture to deliver more powerful features and functionality to your users, you must carefully consider the impact upon your farm’s capacity and performance. When you read this document, you will understand how to:

Define performance and capacity targets for your environment Plan the hardware required to support the number and type of users, and the features

you intend to deploy Design your physical and logical topology for optimum reliability and efficiency Test, validate and scale your environment to achieve performance and capacity targets Monitor your environment for key indicators

Before you read this document, you should read the following: Performance and capacity management (SharePoint Server 2010) Plan Farm Topology (FAST Search Server 2010 for SharePoint)

4

http://technet.microsoft.com/en-us/library/ff599528.aspx

http://technet.microsoft.com/en-us/library/cc262971.aspx

Search OverviewSizing ApproachThe scenarios in this document describe FAST Search Server test farms, with assumptions that allow you to start planning for the correct capacity for your farm. To choose the right scenario, you need to consider the following questions:

1. Corpus Size: How much content needs to be searchable? The total number of items should include all objects: documents, web pages, list items, etc.

2. Availability: What are the availability requirements? Do customers need a search solution which can survive the failure of a particular server?

3. Content Freshness: How "fresh" do you need the search results? How long after the customer modifies the data do you expect searches to provide the updated content in the results? How often do you expect the content to change?

4. Throughput: How many people will be searching over the content simultaneously? This includes people typing in a query box, as well as other hidden queries like web-parts automatically searching for data, or Microsoft Outlook 2010 Social Connectors requesting activity feeds that contain URLs which need security trimming from the search system.

Search and Indexing LifecycleContent feedingThe scenarios allow you to estimate capacity at an early stage of the farm. Farms move through multiple stages as content is crawled:

Index acquisition This is the first stage of data population, it is characterized by:o Full crawls (possibly concurrent) of content.o Close monitoring of the crawl system, to ensure that hosts being crawled are not a

bottleneck for the crawl. Index Maintenance This is the most common stage of a farm. It is characterized by:

o Incremental crawls of all content, detecting new and changed contento For SharePoint content crawls, a majority of the changes encountered during the

crawl are related to access right changes Index Cleanup This stage occurs when a content change moves the farm out of the

index maintenance stage; for example, when a content database or site is moved from one search service application to another. This stage is not covered in the scenario testing behind this document, but is triggered when:

o A content source and/or start address is deleted from a search service application.o A host supplying content is not found by the content connector for an extended

period of time.

5

Index acquisitionWhen adding new content, feed performance is mainly determined by the configured number of item processing components. Both the number of the CPU cores and the speed of each of them will affect the results. As a first order approximation, a 1GHz CPU core will be able to process one average size Office document per second (around 250 kB). For example, the later discussed M4 scenario has 48 CPU cores for item processing, each being 2.26GHz, providing a total estimated throughput of 48 cores × 2.26GHz ≈ 100 items per second on average. The crawl rate graph below is shown from the SharePoint administration reports. The crawl rate varies depending on the type of the content. Most of the crawl is new additions (labeled as "modified" in the graph).

Note:

The indicated feed rates might saturate content sources and networks during peak feeding rate periods in the above crawl. See section Troubleshooting performance and scalability for further information on how to monitor feeding performance.

Index maintenanceIncremental crawls can consist of various operations.

6

Access right (ACL) changes and deletes: These require near zero item processing, but high processing load in the indexer. Feed rates will be higher than for full crawls.

Content updates: These require full item processing as well as more processing by the indexer compared to adding new content. Internally, such an update corresponds to a delete of the old item, and an addition of the new content.

Additions: Incremental crawls will to some extent also contain newly discovered items. These have the same workload as index acquisition crawls.

Depending on the type of operation, an incremental crawl may be faster or slower than an initial full crawl. It will be faster in the case of mainly ACL updates and deletes, and slower in the case of mainly updated items. Using a backup indexer may slow down the incremental crawl of updated items further.In addition to updates from the content sources, the index is also altered by internal operations:

The FAST Search Server link analysis and click-through log analysis generate additional internal updates to the index. Example: A hyperlink in one item will lead to an update of the anchor text info associated with the referenced item. Such updates have a similar load pattern as the ACL updates.

At regular intervals, the indexer performs internal reorganization of index partitions and data defragmentation. Defragmentation is started every night at 3am, while redistribution across partitions occurs whenever needed.

These internal operations imply that you may observe indexing activity also outside intervals with ongoing content crawls.

Query loadIndex partitioning and query evaluation The overall index is partitioned on two levels:

Index columns: When the complete index is too large to reside on one server, it can be split into multiple disjoint index columns. A query will then be evaluated against all index columns within the search cluster, and the results from each index column are merged into the final query hit list.

Index partitions: Within each index column the indexer uses a dynamic partitioning of the index in order to handle large number of indexed items with low indexing and query latency. This partitioning is dynamic and handled internally on each index server. When a query is evaluated, each partition runs within a separate thread. The default number of partitions is 5. In order to handle more than 15 million items per server (column), you need to change the number of partitions (and associated query evaluation threads). This is discussed in section Configuration for extended content capacity.

Query latency Evaluation of a single query is schematically illustrated in the following figure.

CPU processing (light blue) is followed by waiting for disk access cycles (white) and actual disk data read transfers (dark blue); repeated in the order of 2-10 times per query. This implies that

7

the query latency depends on the speed of the CPU, as well as the I/O latency of the storage subsystem.A single query is evaluated separately, and in parallel, across multiple index partitions in all index columns. In the default five-partition configuration, each query is evaluated in five separate threads within every column.

Query throughputWhen query load increase, multiple queries are evaluated in parallel as indicated in the figure below.

As different phases of the query evaluation occurs at different times, simultaneous I/O accesses are not likely to become a bottleneck. CPU processing shows considerable overlap, which will be scheduled across the available CPU cores of the node.

In all scenarios tested, the query throughput reaches its maximum when all available CPU cores are 100% utilized. This happens before the storage subsystem becomes saturated. More and faster CPU cores will increase the query throughput, and eventually make disk accesses the bottleneck.

Note:

In larger deployments with many index columns the network traffic between query processing and query matching nodes may also become a bottleneck, and you may consider increasing the network bandwidth for this interface.

8

Index size impact on query performanceQuery latency is to some extent independent of query load up to the CPU starvation point at maximum throughput. Query latency for each query is a function of the number of items in the largest index partition. The following diagram shows query latency on a system starting out with 5 million items in the index, with more content being added in batches up to 43 million items. The data is taken from the M6 scenario described later. Feeds are not running continuously, in order to see the feeding effects on query performance at different capacity points.

There are three periods in the graph where search has been stopped, rendered as zero latency. You can also observe that the query latency is slightly elevated when query load is applied after an idle period. This is due to caching effects.The query rate is on average 10 queries per minute, apart from a test for query throughput within the first day of testing. This reached around 4000 queries per minute, making the 10 qpm query rate graph almost invisible. Thus the graph above shows the light load query latency, and not latency during maximum throughput.

9

The following diagram show the feed rates during the same interval.

By comparing the two graphs, we see that an ongoing feed gives some degradation of query latency. As this scenario has a search row with backup indexer, the effect is anyhow much less than in systems with search running on the same nodes as indexer and item processing.

10

Percentile based query performanceThe graphs presented earlier in this document show the average query latency. The SharePoint administrative reports also provide percentile based reports. This can provide a more representative performance summary; especially under high load conditions.The following graph shows the percentile based query performance for the same system as in the previous section. While the previous graphs showed average query latencies around 500-700ms, the percentile graph shows that the median latency (50th percentile) is leveling out around 400ms when content is added. The high percentiles show larger variations, both due to the increased number of items on the system, as well as the impact of ongoing crawls.

Note:

The percentile based query performance graph includes the crawl rate of the Query SSA. This will not show any crawling activity, as the Query SSA will only crawl user profile data for the people search index. People search is not included in the test scenarios in this document. Crawling of all other sources is performed by the FAST Search Server Content SSA.

The query throughput load test during the first day reveals that high query load will reduce the latency for the high percentiles. During low query load and ongoing feed, a large fraction of the

11

queries will hit fresh index generations without caches. When query load increases (within the maximum throughput capacity), the fraction of cold cache queries goes down. This will reduce the high percentile latencies.

Deployments with indexing and queries on the same rowAs crawls and queries both use CPU resources, deployments with indexing and queries on the same row will show some degradation in query performance during content crawls. Single row deployments are likely to have indexing, query and item processing all running on the same servers.The following test results are gathered by applying an increasing query load to a single row system. The graphs are gathered from the SharePoint administrative reports. The query latency is plotted as an area vs. the left axis, while the query throughput is a light blue line vs. the right axis. In the following diagram there is no ongoing content feed. The colors of the graph are as follows:

Red: Backend, that is time consumed in the FAST Search Server nodes Yellow: Object model Blue: Server rendering

12

Query latency remains stable around 700 ms up to 8 queries per second (~500 queries per minute). At this point the server CPU capacity becomes saturated. When applying even higher load, query queues build up, and latency increases linearly with the queue length.In the following diagram, the same query load is applied with ongoing content feeding. This implies that queries need to utilize CPU capacity from the lower prioritized item processing. Consequently, query latency now starts to increase even at low load, and also the maximum throughput is reduced from ~600 to ~500 queries per minute. Note the change in scale on the axis compared to the previous graph.

Query latency will have higher variation during feed. The spikes shown in the graph are due to the indexer completing larger work batches; leading to new index generations which invalidates the current query caches.

13

Using a dedicated search rowYou can deploy a dedicated search row to isolate query traffic from indexing and item processing. This requires twice the number of servers in the search cluster, at the benefit of better and more consistent query performance. Such a configuration will also provide query matching redundancy.A dedicated search row implies some additional traffic during crawls when the indexer creates a new index generation (a new version of the index for a given partition). The new index data is passed over the network from the indexer node to the query matching node. Given a proper storage subsystem, the main effect on query performance is a slight degradation when new generations arrive due to cache invalidation.

Search row combined with backup indexerYou can deploy a backup indexer in order to handle non-recoverable errors on the primary indexer. You will normally co-locate the backup indexer with a search row. For this scenario you should normally not deploy item processing to the combined backup indexer and search row. The backup indexer increase the I/O load on the search row, as there will be additional housekeeping communication between the primary and backup indexer to keep the index data on the two servers in sync. This also includes additional data storage on disk for both servers. Make sure that you dimension your storage subsystem to handle the additional load.

Network trafficWith increased CPU performance on the individual servers, the network connection between the servers can become a bottleneck. As an example, even a small 4-node FAST Search Server farm can process and index more than 100 items per second. If the average item is 250 Kbytes, this will represent around 250 Mbit/s average network traffic. Such a load may saturate even a 1Gbit/s network connection.The network traffic generated by content feeding and indexing can be decomposed as follows:

The indexing connector within the Content SSA retrieves the content from the source The Content SSA (within the SharePoint farm) passes the retrieved items in batches to

the content distributor component in the FAST Search Server farm Each item batch is sent to an available item processing component, typically located on

another server After processing, each batch is passed to the indexing dispatcher, which will split the

batches according to the index column distribution The indexing dispatcher distributes the processed items to the indexers of each index

column The binary index is copied to additional search rows (if deployed)

The accumulated network traffic across all nodes can be more than five times higher than the content stream itself in a distributed system. A high performance network switch is needed to interconnect the servers in such a deployment.High query throughput also generates high network traffic, especially when using multiple index columns. Make sure you define the deployment configuration and network configuration to avoid too much overlap between network traffic from queries and network traffic from content feeding and indexing.

14

Web Analyzer performance dimensioningPerformance dimensioning of the Web Analyzer component depends on the number of indexed items and whether the items contain hyperlinks. Items containing hyperlinks, or is linked to, will represent the main load on the Web Analyzer.Database-type content does normally not contain hyperlinks. SharePoint and other types of Intranet content will often contain HTML with hyperlinks. External Web content is almost exclusively HTML documents with many hyperlinks.Although the number of CPU cores and the amount of disk space is vital for performance dimensioning of the Web Analyzer, disk space is the most important. The following table specifies rule-of-thumb dimensioning recommendations for the Web Analyzer.

Content type Number of items per CPU core GB disk per million itemsDatabase 20 million 2SharePoint / Intranet 10 million 6Public Web content 5 million 25

Note:

The table provides dimensioning rules for the whole farm. If the Web Analyzer components are distributed over two servers the requirement per server will be half of the given values.

The amount of memory needed is the same for all types of content, but depends on the number of cores used. We recommend planning for 30 MBytes per million items plus 300 MBytes per CPU core.The link, anchor text or click through log analysis will only be performed if sufficient disk space is available. The number of CPU cores only impacts the amount of time it takes to update the index with anchor text and rank data. If the installation contains different types of content, the safest capacity planning strategy is to use the most demanding content type as the basis for the dimensioning. For example; if the system contains a mix of database and SharePoint content it is recommended to dimension the system as if it only contains SharePoint content.

15

ScenariosThis section describes typical deployments for variously sized search farms, with some relevant hardware variations for each scale point. The following scale points are included:

XS: Extra-small FAST Search farm tested with 1, 5 and 8 million items S: Small FAST Search farm with 15 million items (planned for inclusion in future release) M: Medium FAST Search farm with 40 million items L: Large FAST Search farm with 100 million items XL: Extra-large FAST Search farm with 500 million items (planned for inclusion in future

release)For each of these scale points, several scenarios are defined. These are labeled as M1, M2 and so on for the medium scale point, and correspondingly for the others. Content is crawled from SharePoint, Web servers and file shares.

Note:

The scenarios below does not include storage sizing for storing a system backup, as backups would normally not be stored on the FAST Search Server nodes themselves.

The next subsection describes the specifications shared across all scenarios, while each of the following subsections describes a specific scenario. General guidelines follow in the "Recommendations" section.

Shared specifications across all scenariosNote:

The FAST Search Server index does not use any SQL Server based property database. The people search index use the property database, but the test scenarios in this document does not include people search.

Query workloadThis section describes the workload used for query profiling. The number of queries per second (QPS) is varied from 1 QPS to about 40 QPS and the latency is recorded as a function of this. The query test set consists of 76501 queries. These queries have the following characteristics:Query terms Number of queries Percentage of test set1 49195 64,532 24520 32,163 2411 2,814 325 0,435 43 0,067 7 0,01

There are two types of multi-term queries used:

16

1. ALL queries (70%), meaning all terms must appear in matching items. This includes queries containing the explicit AND, as well as list of terms that implicitly is parsed as an AND statement.

2. ANY queries (30%), meaning at least one of the terms must appear in matching items (OR).

The queries are chosen by random selection. The number of user agents (simulated users) defines the query load. One agent repeats the following two steps during the test:

1. Submit a query2. Wait for the response

There is no pause between the repetitions of these steps; the agent submits a new query immediately after receiving a query response. The number of agents increases in steps during the test. For example, the figure below shows a typical test where the number of agents increases periodically, adding two agents every 15 minute.

Notes on measured disk usageActual disk usage is listed for all scenarios. Please note that:

Raw source data size is only included for illustration. These data do not occupy any disk space on the FAST Search Server system.

The indexer stores the processed items on disk in an XML based format called FiXML. FiXML data serves as input to the indexing process which builds the indices. Every submitted item is stored in FiXML format. Old versions are removed once a day. The data size given contains only a single version of every item.

FAST Search Server keeps a read-only binary index file set to serve queries while building the next index file set. The worst-case disk space usage for index data is approximately 2.5 times the size of a single index file set. The 0.5 factor constitute various temporary files.

17

When running with primary and backup indexers the indexers may consume an additional 50 GB each for synchronization data for other data including Web Analyzer data, log files, etc.

Note:

The ratio between source data and index data depends strongly on the content type. This is related to the different amount of searchable data in the various data formats.

Configuration for extended content capacityFAST Search Server has a default configuration that is optimized for handling up to 15 million items per index column, with a hard limit of 30 million items per index column. Some of the scenarios described in this document use a modified configuration to allow for up to 40 million items per column. This is referred to as an extended capacity configuration. The extended content capacity configuration has more index partitions within each server node. In this way low query latency can be maintained at the expense of reduced maximum QPS.There are some tradeoffs by extending the capacity:

Query throughput (QPS) is reduced. Query latency (while not exceeding the throughput limitation) is less affected. Query throughput reductions can be compensated with multiple search rows, but then the reduction in server count is diminishing.

Indexing will require more resources, and also more disk accesses. More items per column require more storage space per server. The total storage space

across the entire farm is mainly the same. There are fewer nodes for distributing item processing components. Initial feed rate will

be reduced, as the feed rate is mainly dependent on the number of available CPU cores. Incremental feeds will also have lower throughput as each index column has more work. Initial pre-production bulk feeds can be accelerated by temporarily adding item processing components to eventual search rows, or additional servers temporarily assigned to the cluster.

More hardware resources per server are required. It is not recommended to use the extended settings on a server with less than 16 CPU cores/threads (24 or more is recommended). 48 GB RAM is recommended, and a high-performance storage subsystem. See individual scenarios for tested configurations.

In summary, the extended content capacity configuration is only recommended for deployments with:

High content volumes, but where the number of changes over time is low, typically less than 1 million changes per column over 24 hours.

Low query throughput requirements (not more than 5-10 queries per second, depending on CPU performance of the servers).

Search running on a different search row than the primary indexer, as the indexer is expected to be busy most of the time.

Note:

When estimating the change rate that a farm must be able to consume, keep in mind that any content change implies a load to the system, including ACL changes. ACL changes may

18

appear for many items at a time in case of access right changes to document libraries or sites, resulting in high peak update rates.

Note:

Modifying the indexer configuration has implications on how to perform patch and service pack upgrades. See procedure below.

Enable extended content capacityIn order to reconfigure the indexers to handle up to 40 million items per column, you must modify the indexer template configuration file and run the deployment script to generate and distribute the new configuration.

Note:

Only apply the following procedure to indexers which do not contain any data.

1. Verify that no crawling is ongoing2. Verify that no items are indexed on any of the indexers. Run the following command:

%FASTSEARCH%\bin\indexerinfo –a doccount All the indexers should report 0 items.

3. On all FAST Search Server nodes, run the following commands:net stop fastsearchservice

net stop fastsearchmonitoring

4. On the administration server node:a. Save a backup of the original configuration file,

%FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data\RTSearch\clusterfiles\rtsearchrc.xml.win32.template You may need this backup at a later stage if this configuration file is modified in any patch or service pack upgrade.

b. Modify the following values within the original configuration file:i. Set the numberPartitions to 10 (the default is 5).ii. Set the docsDistributionMax to

6000000,6000000,6000000,6000000,6000000,6000000,6000000,6000000,6000000,6000000 The default value is 6000000,6000000,6000000,6000000,6000000

c. Modify the deployment file to enable re-deployment.%FASTSEARCH%\etc\config_data\deployment\deployment.xml must be modified in order for the Set-FASTSearchConfiguration PowerShell cmdlet to run the re-deployment. You can do that by opening the file in Notepad, add a space and save the file.

d. Run the following commands: Set-FASTSearchConfiguration

19

net start fastsearchservice

5. On all non-administration server nodes, run the following commands:Set-FASTSearchConfiguration

net start fastsearchservice

Handling patches and service pack upgradesFor all future patch or service pack updates, you need to verify if this configuration file is updated as part of the patch or service pack update. Review the readme file thoroughly to look for any mention of this configuration file. If a patch or service pack involves an update of this configuration file, the following steps must be followed.

1. Replace the configuration file%FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data\RTSearch\clusterfiles\rtsearchrc.xml.win32.templatewith the backup of the original file that you have saved.

2. Perform the patch or service pack upgrade according to the appropriate procedure. 3. Perform the change to the configuration file template as specified above. Do not forget to

back up the modified configuration file template!

Extra small FAST Search FarmThe extra small FAST Search farm is targeting a smaller test corpus with high query rates. The amount of content is up to 8 million items. There are no off-business hours with reduced query load. Crawls are likely to occur at any point in time. Query performance is measured at 1M, 5M and 8M content volume.The configuration for the parent SharePoint farm uses four front-end Web servers, one application server and one database server arranged as follows:

One crawl component of the Content SSA is running on a single server. One of the application servers also hosts Central Administration for the farm. One database server hosts the crawl databases, the FAST Search Server administration

databases, as well as the other SharePoint databases.Application server and Web front end servers will only have disk space for operating system and programs. No separate data storage is required.

20

Deployment alternativesFor the extra small farm scenario, the following alternatives have been tested for the FAST Search Server farm back-end:

XS1. Single server install hosting all FAST Search Server components using regular disk drives

XS2. Same as XS1, deployed on single virtual machineXS3. Four nodes install running on four virtual machines, all running on the same physical

server.XS4. Same as XS1, with the addition of a dedicated search row (2 servers)XS5. Same as XS1, but with storage on SAS SSD drives

SpecificationsThis section provides detailed information about the hardware, software, topology, and configuration of the test environment.

HardwareFAST Search Server farm serversAll the extra small size deployment alternatives are running on similar hardware, although some of the setups using virtualization, others solid state disk (SSD) storage.Shared specifications:

Windows Server 2008 R2 x64 Enterprise Edition 2x Intel L5520 CPUs

o Hyper-threading switched ono Turbo Boost switched on

24 GB memory 1 Gbit/s network card Storage subsystem

o OS: 2x 146GB 10k RPM SAS disks in RAID1o Application: 7x 146 GB 10k RPM SAS disks in RAID5. Total formatted capacity of

880 GB.o Disk controller: HP Smart Array P410, firmware 3.30o Disks: HP DG0146FARVU, firmware HPD6

Changes for XS2 and XS3: Virtualized servers running under Hyper-V. Host server has same specification as XS1

o 4 CPU coreso 8 GB memoryo 800 GB disk on servers with index component

21

Changes for XS5: Storage subsystem

o Application: 2x 400 GB SSD disks in RAID0. Total formatted capacity of 800 GB.o SSD disks: Stec ZeusIOPS MLC Gen3, part Z16IZF2D-400UCM-MSF

SharePoint Server 2010 serversApplication and Web front end servers do not need storage apart from operating system, application binaries and log files.

Windows Server 2008 R2 x64 Enterprise edition 2x Intel L5420 CPUs 16 GB memory 1 Gbit/s network card Storage subsystem for OS/Programs: 2x 146GB 10k RPM SAS disks in RAID1

SQL serversSame specification as for SharePoint 2010 servers above, with additional disk RAID for SQL data with 6x 146GB 10k RPM SAS disks in RAID5.

22

TopologyThis section describes the topology of the test environment for all deployment alternatives.

XS1XS1 is a generic single node install using the following deployment.xml file: <?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="XS1" xmlns=”http://www.microsoft.com/enterprisesearch” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>XS1</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS1.jdbc]]> </connector-databaseconnectionstring>

<host name="fs4sp1.contoso.com"> <admin />

<query /><content-distributor />

<indexing-dispatcher /> <searchengine row="0" column="0" /> <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>

<document-processor processes="12" /> </host>

<searchcluster> <row id="0" index="primary" search="true" /> </searchcluster></deployment>

XS2XS2 is a generic single node install without a deployment file, running on a single virtual machine. In practice, the following deployment file would have given the same setup.<?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="XS2" xmlns=”http://www.microsoft.com/enterprisesearch” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>XS2</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS2.jdbc]]> </connector-databaseconnectionstring>

<host name="fs4sp1.contoso.com"> <admin />

<query /><content-distributor />

<indexing-dispatcher /> <searchengine row="0" column="0" /> <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="1"/>


<searchcluster> <row id="0" index="primary" search="true" /> </searchcluster></deployment>

23

XS3XS3 is distributed across four virtual machines, getting a comparable hardware footprint to XS1. <?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="XS3" xmlns=”http://www.microsoft.com/enterprisesearch” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>XS3</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS3.jdbc]]> </connector-databaseconnectionstring>

<host name="fs4sp1.contoso.com"> <admin /> <query /> <document-processor processes="4" />

</host>

<host name="fs4sp2.contoso.com"> <indexing-dispatcher /> <searchengine row="0" column="0" />

</host>

<host name="fs4sp3.contoso.com"> <content-distributor /> <document-processor processes="4" />

</host>

<host name="fs4sp4.contoso.com"> <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4" /> <document-processor processes="4" />

</host>

<searchcluster> <row id="0" index="primary" search="true" /> </searchcluster> </deployment>

XS4XS4 is the same as the XS1 deployment, but extended with an additional search row to get search redundancy.<?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="XS4" xmlns=”http://www.microsoft.com/enterprisesearch” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>XS4</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS4.jdbc]]> </connector-databaseconnectionstring>

<host name="fs4sp1.contoso.com"> <admin /> <query /> <content-distributor /> <indexing-dispatcher /> <searchengine row="0" column="0" /> <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>


24

<host name="fs4sp2.contoso.com"> <query /> <searchengine row="1" column="0" /> </host>

<searchcluster> <row id="0" index="primary" search="true" /> <row id="1" index="none" search="true" /> </searchcluster>

</deployment>

XS5XS5 uses the same deployment file as XS1, but with storage on SAS SSD drives.

DatasetThis section describes the test farm dataset, including database content and sizes, search indexes, and external data sources. The overall metrics are shown in the table below.

Object ValueSearch index size (# of items) 5.4 MSize of crawl database 16.7 GBSize of crawl database log file 1.0 GBSize of property database < 0.1 GBSize of property database log file < 0.1 GBSize of SSA administration database < 0.1 GB

The table below shows the content source types used to build the index. The numbers in the table reflect the total number of items per source. The difference between the total number of items and the index size above is due to two factors:

Items may be disabled from indexing in the content source The document format type cannot be indexed

For SharePoint sources, the size of the respective content database in SQL is used as the raw data size.

Content source Items Raw data size Average size per item

HTML 1 1.1 M 8.8 GB 8.1 kBSharePoint 1 4.5 M 2.0 TB 443 kBHTML 2 3.2 M 137 GB 43 kBTotal 8.8 M 2.2 TB 246 kB

The test scenarios do not include people search data. People Search is crawled and indexed in a separate index within the Query SSA.

25

Test ResultsThis section provides data that shows how the farm performed under load.

Feeding and indexing performanceAll configurations apart from XS2 have the same CPU resources available. XS2 is running on a single virtual machine, and is thus limited to four CPU cores, as opposed to 16 for the others. The following graph shows the average number of items per second for the different content sources during a full crawl:

Overall, XS2 shows 65-70% performance degradation compared to running on physical hardware. This is expected, as the single VM is restricted by available CPU resources. XS3, running four VMs and thus having the same hardware footprint as XS1, results on 35-40% degradation compared to running directly on the host computer. The major degradation stems from the lower IO performance when running in a virtual machine, using a fixed size VHD file. The spilt of XS3 resources across four virtual machines also infers more server-to-server communication.

Query performanceThe following sub sections describe the query performance impact both from having different farm deployments and varying content volume. There is also a separate test section for the effects of tuning for the low document volume in the XS scenarios, combined with solid state storage disks (SSD).

26

Impact of deployment configuration

The above graph shows the query performance of the different scenarios when there is no ongoing feed. XS1 and XS5 show only minor differences, with a slightly better performance for the SSD based XS5 (running with two SSDs versus 7 regular SAS spindles for XS1). As expected, the additional search row in XS4 does not improve query performance under idle crawl conditions. XS4 has the same throughput as XS1/XS5 under high load, but with slightly increased latency. This is due to queries being directed to both search rows, implying a lower cache hit ratio; as well as intra-node communication.The virtualized scenarios (XS2 and XS3) have a significantly lower query performance, and also with higher variation than the non-virtualized options. As observed for feed performance, this reduction is related to the storage performance, in addition to the search components having maximum four CPU cores at disposal.

27

The situation is somewhat different in the above graph, showing query performance under full crawl. The single server XS1 scenario gets a reduction in query performance under concurrent crawl load. XS5 has less impact due to the improved storage performance, but does still see CPU congestion between item processing and query components. XS4 is least impacted, as this scenario has a dedicated search row. XS4 results vary more at concurrent high query and feed load due to competition for network resources.The virtualized scenarios are both below 10 QPS maximum throughput under these load conditions. XS1 (native hardware) and XS3 (virtualized) have the same hardware footprint, with the non-virtualized configuration having more than five times the throughput. Some of this difference is due to virtualization overhead, especially storage performance; and some due to the limitations of a virtual machine with regards to how many CPU cores it has available. Under high search load, the query components can use all 16 CPU cores in XS1, while this is restricted to maximum four CPU cores with XS3.

Impact of varying content volumeEven though the XS-scale scenarios are sized for 8M documents, query performance testing was also run at 1M and 5M items indexed. The following graph shows how the content capacity affects query performance:

28

The solid lines show that maximum query capacity improves with less content, with maximum 90 QPS at 1M items, 80 QPS at 5M items, and 64 QPS at 8M items. During feed, the 1M index can still sustain > 40 QPS, although with a lot of variance. This is due to the total index size being relatively small, and most of it being able to fit inside application and OS level caches. Both 5M and 8M indices have a lower maximum query performance during feed, in the 25-30 QPS range.

Impact of tuning for high performance storageEven if the XS5 scenario demonstrated improved performance over XS1 with default settings, configuration tuning allows better utilization of the higher IOPS potential in SSDs. This tuning is done the same way as enabling extended content capacity discussed earlier, although by only changing docsDistributionMax setting, and not the number of partitions:docsDistributionMax=”2500000,2500000,2500000,2500000,2500000”

This will reduce the maximum practical capacity per column to 8–9 million items, but also spread the workload across multiple smaller partitions than the default setting. This allows for more parallel query execution, at the expense of more disk operations.The following graph shows the result of this tuning at full capacity (8M items), which allows the SSD based XS5 scenario to serve up to 75 QPS, and also reduce the response time under light query load. For example, the response time at 40 QPS at idle crawl is reduced from 0.4 to 0.2 seconds. Further, the response time during crawls is better with this tuning, as well as more consistent. The tuned XS5 scenario is able to deliver around 40 QPS with sub-second latency during crawls, while XS1 only delivered 15 QPS with the same load and latency requirements.

29

In total, using high performance storage provides improved query performance, especially during concurrent content crawls, and thus reduces or even eliminates the performance driven need to run search on dedicated rows. SSDs also provide sufficient performance with a smaller number of disks. In this case two SSDs outperform seven SAS spindles. This is attractive where power, or space restrictions, does not allow for a larger number of disks, for example for blade servers.

Disk usageThe table below shows the combined increase in disk usage on all nodes after the various content sources have been indexed. Note that scenarios using replication of FiXML and/or index data needs additional space.

Content source Raw source data size

FiXML data size

Index data size Other data size

HTML 1 1.1 M 6 GB 20 GB 4 GBSharePoint1 4.5 M 41 GB 108 GB 15 GBHTML 2 3.2 M 27 GB 123 GB 22 GBTotal 8.8 M 74 GB 251 GB 41 GB

Small FAST Search FarmThis scenario has not yet been tested. Planned capacity is 15 million items per farm.

30

Medium FAST Search FarmThe medium FAST Search Farm is targeting a moderate test corpus. The amount of content is up to 40 million items, and to meet freshness goals, incremental crawls are likely to occur during business hours.The configuration for the parent SharePoint farm uses two front-end Web servers, two application servers and one database server arranged as follows:

Two crawl components for the Content SSA are distributed across the two application servers. This is mainly due to I/O limitations in the test setup (1 Gbit/s network), where a single network adapter would have been a bottleneck.

One of the application servers also hosts Central Administration for the farm. One database server hosts the crawl databases, the FAST Search Server administration

databases, as well as the other SharePoint databases.Application servers and Web front end servers will only have disk space for operating system and programs. No separate data storage is required.

Deployment alternativesFor the medium farm scenario, the following alternatives have been tested for the FAST Search Server farm back-end:

M1. One combined administration and Web Analyzer server, and three index column servers with default configuration (4 servers)

M2. Same as M1, but using SAN storage (4 servers)M3. A single high capacity server hosting all FAST Search Server componentsM4. Same as M1, with the addition of a dedicated search row (7 servers)M5. Same as M3, with the addition of a dedicated search row (2 servers)M6. Same as M4, but where the search row includes a backup indexer row (7 servers)M7. Same as M5, but where the search row includes a backup indexer row (2 servers)M8. Same as M3, but using solid state drives (1 server)M9. Same as M3, but on more powerful hardware (1 server)M10. Same as M1, but using solid state drives for indexer/search nodes (4 servers)


HardwareFAST Search Server farm serversThe following hardware specifications have been used for the medium size deployment alternatives.

Shared specifications: Windows Server 2008 R2 x64 Enterprise Edition

31

2x Intel L5520 CPUso Hyper-threading switched ono Turbo Boost switched on


o OS: 2x 146GB 10k RPM SAS disks in RAID1o Application: 18x 146 GB 10k RPM SAS disks in RAID50 (two parity groups of 9

drives each). Total formatted capacity of 2 TB.o Disk controller: HP Smart Array P410, firmware 3.00o Disks: HP DG0146FARVU, firmware HPD5

Changes for M2: Application is hosted on 2TB partitions on a SAN SAN used for test

o 3Par T-400o 240 15k RPM spindles (450GB each)o Dual ported FC connection to each application server using MPIO without any FC

switch. MPIO enabled in the operating system.Changes for M3/M5:

48 GB memory Application is hosted on 22x300GB 10k RPM SAS drives in RAID50 (two parity groups of

11 spindles each). Total formatted capacity of 6TB.Changes for M7:

2x Intel L5640 CPUso Hyper-threading switched ono Turbo Boost switched on

48 GB memory Dual 1 Gbit/s network card Storage subsystem:

o Application hosted on 12x 1TB 7200 RPM SAS drives in RAID10. Total formatted capacity of 6TB.

o Disk controller: Dell PERC H700, firmware 12.0.1-0091o Disks: Seagate Constellation ES ST31000424SS, firmware KS65

Changes for M8: 2x Intel L5640 CPUs



o Application: 3x 1280 GB SSD cards in RAID0. Total formatted capacity of 3.6 TB.

32

o SSD cards: Fusion-IO ioDrive Duo 1.28 TB MLC, firmware revision 43284, driver 2.2 build 21459

Changes for M9: 2x Intel X5670 CPUs



o Application hosted on 12x 600GB 15k RPM SAS drives in RAID50. Total formatted capacity of 6TB.

o Disk controller: LSI MegaRAID SAS 9260-8i, firmware 2.90-03-0933o Disks: Seagate Cheetah 15K.7 ST3600057SS, firmware ES62

Changes for M10: 2x Intel L5640 CPUs


48 GB memory Dual 1 Gbit/s network card Storage subsystem (search cluster nodes only):

o Application: 1x Fusion-IO ioDrive Duo 1.28 TB MLC SSD card, firmware revision 43284, driver 2.2 build 21459




33

TopologyThis section describes the topology of the test environment for all deployment alternatives.

Note:

All the tested deployment alternatives use the same SharePoint Server and Database Server configuration as shown for M1/M2/M10. For the other deployments only the FAST Search Server farm topology is shown.

M1/M2/M10M1, M2 and M10 are similar except for the storage subsystem. M1 is running on local disk, while M2 uses SAN storage and M10 uses solid state storage for the search cluster. All three deployment alternatives have a search cluster with 3 index columns and one search row. There is one separate administration node that also includes the Web Analyzer components. Item processing is spread out across all nodes. None of these three alternatives have a dedicated search row. This implies that there will be a noticeable degradation in query performance during content feeds. The impact can be reduced by feeding in off-peak hours, or by reducing the number of item processing components to reduce the maximum feed rate.

34

The following figure shows the M1 deployment alternative. M2 and M10 have the same configuration.

35

The following deployment.xml file is used for M1, M2 and M10.<?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="M1" xmlns=”http://www.microsoft.com/enterprisesearch” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>M1</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M1.jdbc]]> </connector-databaseconnectionstring>

<host name="fs4sp1.contoso.com"> <admin /> <query /> <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>


<host name="fs4sp2.contoso.com"> <content-distributor /> <searchengine row="0" column="0" />



<document-processor processes="12" /> </host> <host name="fs4sp4.contoso.com"> <indexing-dispatcher /> <searchengine row="0" column="2" />

<document-processor processes="12" /> </host> <searchcluster> <row id="0" index="primary" search="true" /> </searchcluster>

</deployment>

36

M3The M3 scenario combines all components on one server. Running concurrent feeding and query load has the same impact as for M1/M2/M10, but in addition, the reduced number of servers implies fewer items processing components, and thus lower feed rate.The following figure shows the M3 deployment alternative.

The following deployment.xml file is used.<?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="M3" xmlns=”http://www.microsoft.com/enterprisesearch” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>M3</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M3.jdbc]]> </connector-databaseconnectionstring>



<searchcluster> <row id="0" index="primary" search="true" /> </searchcluster>

</deployment>

37

M4M4 corresponds to M1/M2/M10 with the addition of a dedicated search row. The search row adds query throughput capacity, introduces query redundancy, and provides better separation of query and feeding load. Each of the three servers running the dedicated search row also includes a query processing component (query). In addition, the deployment includes a query processing component on the administration node (fs4sp1.contoso.com). The Query SSA does not use this query processing component during normal operation, but may be used as a fallback to be able to serve queries if the entire search row is taken down for maintenance.The following figure shows the M4 deployment alternative.

38






<host name="fs4sp3.contoso.com"> <content-distributor /> <indexing-dispatcher /> <searchengine row="0" column="1" />




<host name="fs4sp6.contoso.com"> <query /> <searchengine row="1" column="1" /> </host> <host name="fs4sp7.contoso.com"> <query /> <searchengine row="1" column="2" /> </host> <searchcluster> <row id="0" index="primary" search="true" /> <row id="1" index="none" search="true" /> </searchcluster>

</deployment>

39

M5M5 corresponds to M3 with the addition of a dedicated search row, giving the same benefits as M4 compared to M1/M2/M10.The following figure shows the M5 deployment alternative.

40





<searchcluster> <row id="0" index="primary" search="true" /> <row id="1" index="none" search="true" /> </searchcluster>

</deployment>

M6M6 has the same setup as M4 with an additional backup indexer enabled on the search row. The backup indexer is deployed by modifying the M4 deployment .xml file as shown below. … <searchcluster> <row id="0" index="primary" search="true" /> <row id="1" index="secondary" search="true" /> </searchcluster>…

41

M7M7 has the same setup as M5, with an additional backup indexer enabled on the search row. M7 is also running on nodes with more CPU cores (see hardware specifications), allowing to increase the number of item processing components in the farm; also running on the search row. The following deployment.xml is used.<?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="M7" xmlns=”http://www.microsoft.com/enterprisesearch” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>M7</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M5.jdbc]]> </connector-databaseconnectionstring>



<host name="fs4sp2.contoso.com"> <query /> <searchengine row="1" column="0" /> <document-processor processes="8" /> </host>

<searchcluster> <row id="0" index="primary" search="true" /> <row id="1" index="secondary" search="true" /> </searchcluster>

</deployment>

42

M8/M9The M8/M9 deployment alternatives combine all components on one server, just like M3. The differences are that M8/M9 are running on hardware with better performance, especially for the disk subsystem, and that they have an increased number of CPU cores that allows for more item processing components.M8 uses solid state storage. M9 has more CPU power (X5670 vs. L5520/L5640 used on most other M-scale tests) and the fastest disk spindles readily available (12x 15k RPM SAS disks).The following deployment.xml file is used for both M8 and M9.<?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="M8" xmlns=”http://www.microsoft.com/enterprisesearch” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>M8</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M8.jdbc]]> </connector-databaseconnectionstring>



<searchcluster> <row id="0" index="primary" search="true" /> </searchcluster>

</deployment

DatasetThis section describes the test farm dataset, including database content and sizes, search indexes, and external data sources. The below table shows the key metrics data:

Object ValueSearch index size (# of items) 42.7 millionSize of crawl database 138 GBSize of crawl database log file 11 GBSize of property database <0.1 GBSize of property database log file 0.3 GBSize of SSA administration database <0.1 GB

The table below specifies which content source types that are used to build the index. The numbers in the table reflect the total number of items per source, including replicated copies. The difference between the total number of items below (43.8 million) and the index size above (42.7 million) is due to two factors:


43


Content source items Raw data size Average size per item

File share 1 (2 copies) 1.2 M 154 GB 128 kBFile share 2 (2 copies) 29.3 M 6.7 TB 229 kBSharePoint 1 4.5 M 2.0 TB 443 kBSharePoint 2 4.5 M 2.0 TB 443 kBHTML 1 1.1 M 8.8 GB 8.1 kBHTML 2 3.2 M 137 GB 43 kBTotal 43.8 M 11 TB 251 kB

Note:

To reach sufficient content volume in the testing of the medium scenario, two replicas of the file shares were added. Each copy of each document would then appear as a unique item in the index, but treated as duplicates by the duplicate trimming feature. From a query matching perspective the load would be similar to having all unique documents indexed, but any results from these sources would trigger duplicate detection and collapsing in the search results.



Feeding and indexing performanceTest results from feeding M1 through M6 are included below. The others not included as those do not show significantly different feed performance than their respective deployment alternatives.

44

Full crawlThe following diagram shows the average number of items processed per second for the various deployment alternatives.

For full crawls, the item processors represent the bottleneck. The limiting factor is the CPU processing capacity. M1, M2 and M4 have similar performance characteristics due to having the same number of item processors available, the same applies to M3 and M5, but note that M1, M2 and M4 have four times higher crawl performance during a full crawl. The reason for this is that they have four times the item processor capacity compared to M3 and M5. Comparing M6 to M4 it also becomes apparent that running with backup indexers incurs a performance overhead due to the extra synchronization work required. Typically an installation without backup indexers, like M4, will outperform one with, like M6.

Incremental crawlThe following diagram shows the average number of items processed per second for the various deployment alternatives.

45

Incremental crawls are faster than full crawls, from slightly faster up to a factor of 2 or 3. This is mainly due to the fact that incremental crawls mostly consist of partial updates, which only updates metadata. This also implies that the feeding performance is largely the same for all content types.

For incremental crawls it is the indexers that are the bottleneck since the item processing load is limited. Typically disk I/O capacity is the limiting factor. During an incremental update the old version of the item is fetched from disk, modified, persisted to disk and then indexed. This is more expensive than a full crawl operation where the item is only persisted and indexed.

Note:

M1 through M5 was tested with content sources having less performance than the other scenarios in this document. The performance numbers can thus not be directly compared, as M1 through M5 tests were to some extent limited by the bandwidth of the content sources.

46

Query performanceM1The following diagram shows the query latency as a function of QPS for the M1 deployment alternative.

An idle indexer gives best query performance, with an average latency less than 0.7 until approximately 21 QPS. The corresponding numbers when doing a full crawl is 10 QPS. The corresponding numbers when doing an incremental crawl is 15 QPS.

Note that the latency is not impacted by higher QPS until you reach max system capacity. The figure shows that QPS will decrease and latency will increase if you apply more query load after the maximum capacity of the system has been reached. This occurs at the point where the curve starts bending "backwards". On the M1 system the peak QPS is about 28, with idle indexers. CPU resources are the bottleneck in this scenario. The behavior is also illustrated in the next diagram where you can observe that performance decrease when having more than 40 simultaneous user agents on an idle system.

47

Hyper-ThreadingCPU resources are the bottleneck in the M1 deployment alternative. Enabling hyper-threading allows more threads to execute in (near) parallel, at the expense of slightly reduced average performance for single-threaded tasks. Note that the query matching components will run in a single thread when QPS is low and no other tasks are running on the server.Hyper-threading performs better for all the three feeding cases in the M1 deployment alternative. In the deployment alternative with dedicated search rows, a small reduction (around 150ms) in query latency is observed when running at very light query load.The following diagram shows the impact of using hyper-threading in the CPU.

In general, hyper-threading reduces query latency and allows for higher QPS, especially when having multiple components on the same server. Disabling hyper-threading only provides a small improvement under conditions where the performance already is good. Hence, having hyper-threading enabled is recommended.

49

M2The following diagram shows the query latency as a function of QPS for the M2 deployment alternative.

Note that the latency is not impacted by higher QPS until you reach max system capacity. When the indexers are idle, there is a slow latency increase until the deployment reaches the saturation point at approximately 20 QPS. For full and incremental crawls the latency increases as indicated in the graph. This test does not include test data to indicate exactly when the query latency saturation takes place during the crawl. The following diagram shows the same test data presented as user agents versus latency:

50

Comparing M1 and M2The next diagram compares the performance of M1 and M2. The main conclusion is that M1 performs somewhat better than M2. M1 is able to handle about 3 QPS more than M2 before reaching the saturation point. The SAN disks used on M2 should be able to match M1’s locally attached disks in terms of I/O operations per second, but the bandwidth towards the disks is somewhat lower with the SAN configuration.

For full crawl and incremental crawl the performance was comparable during the light load tests. During heavy load, ongoing indexing had less impact on M2 as the SAN provided more disk spindles to distribute the load.

51

M3The following diagram shows how the M3 deployment alternative (40 million items on a single server) is able to handle about 10 QPS with idle feeding. This is shown in the diagram below as QPS versus latency. For comparison the M1 data is also included.

One characteristic of the single node installation is that the query latency fluctuates more when getting close to the saturation point.Under low query load, M3 is almost able to match the performance of M1, but during higher load the limitations become apparent. M1 has three times the number of query matching nodes and the peak QPS capacity is close to three times as high, 28 versus 10 QPS.

52

M4M4 is an M1 deployment with an added dedicated search row. The main benefit is that the index and the search processes are not directly competing for the same resources, primarily disk and CPU. The diagrams below show that the added search row in M4 gives a 5 QPS gain versus M1. In addition, the query latency is improved by about 0.2-0.4 seconds.

Adding search rows will in most cases improve query performance, but at the same time it introduces additional network traffic that may impact the performance.

The query performance may degrade when adding search rows if you do not have sufficient network capacity. This is the case when the indexers copies large index files to the query matching nodes. The index file copying may also impact index latency performance due to the added need for copying of indices.

53

The following diagram shows the result of running the query test on M4 having 18, 28 and 43 million documents indexed.

The document volume impacts the maximum QPS the system is able to deliver. Adding ~10 million documents gives a ~5 max QPS reduction. Below 23 QPS the document volume has low impact on the query latency.

54

M5The diagram below shows the query performance of the M5 versus the M3 topology.

As illustrated when comparing M1 and M4, adding a dedicated search row improves query performance. The same is the case when adding a query matching node to the single node M3 setup in order to get an M5 deployment.

M6The diagram below shows the query performance of the M6 versus the M4 topology.

The difference between M6 and M4 is the addition of a backup indexer row. The backup indexers will compete with query matching for available resources, and may degrade query performance. However, in this specific test, that was not the case. The hardware used had enough resources to handle the extra load during normal operations.

55

The backup indexers use significantly less resources than the primary indexer. This is due to the fact that the primary indexers performs the actual indexing and distributes the indices to the search rows and backup indexer row.

Note:

All indexers perform regular optimization tasks of internal data structures between 03.00 AM and 05.59 AM every night. These tasks may, depending on the feed pattern, be quite I/O intensive. Testing on M6 has shown that you may see a significant reduction in query performance during indexer optimization processes. The more update and delete operations the indexer handles, the more optimization is required.

M7The following diagram shows the query latency as a function of QPS for the M7 deployment alternative compared to M5.

M7 is very similar to the M5 scenario, but it is running on servers with more powerful CPUs and more memory. On the other hand, it has a disk subsystem not capable of the same amount of I/O operations per second (IOPS). The M7 storage subsystem has more bulk capacity but less performance compared to M5.

56

The main difference in results compared to M5 is a slightly increased QPS rate before the system becomes saturated. M5 is saturated around 10 QPS, while M7 provides roughly 12 QPS. This is due to the increased CPU performance and added memory, although partly counterbalanced by the weaker disk configuration.

M8M8 has the same extended capacity application configuration as M3 and the following M9, but it is using solid state storage with much higher IOPS and throughput capabilities. This system is only limited by the available CPU processing power. Thus a more powerful CPU configuration, e.g. with quad CPU sockets, should be able to get linearly performance improvements with the added CPU resources.Query performance results for M8 are discussed together with M9 below.

M9The M9 deployment alternative has the same extended capacity application configuration as M3 and M8, but has improved CPU performance (X5670) and high end disk spindles (15k RPM SAS). M9 is thus an example of the achievable performance gains by using high end components, keeping regular disk spindles for storage.The improved CPU performance implies 20-30% increased crawl speeds for M9 over M8 (both with 20 item processor components), and even more compared M3 (which had 12 item processing components). Note that M3 ran with a less powerful server for the content sources, and was more often limited by the sources than M8 and M9. M9 achieved >50 document per second for all content sources.The following graph shows the query performance for M8 and M9 under varying load patterns, compared to M3 and M5:

57

The following observations can be made: Both M8 and M9 perform better during idle feed than M5. M5 performance is only shown

under feed, but as M5 has a dedicated search row, the query performance is relatively constant irrespective of ongoing feed or not. The main contribution to peak query rate improvements are the additional CPU resources on M8 and M9 compared to M5.

During idle feed, M9 will get slightly better QPS than M8 due to the more powerful CPU. Under overload conditions (>1 second latency), M9 although degrades in performance due to an overloaded storage subsystem (just like M5), while M8 can sustain the peak rate with its solid state storage.

M3 (M5 without the search row) saturates already at 5 QPS. M9 does provide higher QPS rates. M9 has higher latency than M3 at low QPS during feed, as M9 has >50% higher feed rates than M3 (due to more item processors and faster CPU). M9 with feed rates reduced to M3 levels would have given better query performance than M3 also at low QPS.

During feeds, M8 query performance is degraded <20% compared to idle, dominantly due to CPU congestion. Thus the storage subsystem on M8 makes it possible to maintain good query performance during feed without doubling the hardware footprint with a search row. Adding more CPU resources would allow for further increase in query performance, as the storage subsystem still has spare resources in the current setup.

On M9, query latency roughly doubles during feed. M9 can still deliver acceptable performance under low QPS loads with concurrent feed, but is much more affected than M8. This is due to the storage subsystem on M9 having slower read accesses when combined with write traffic from feeding and indexing.

58

M10The M10 deployment alternative has the same configuration as M1 and M2 but improves performance by using solid state storage. M10 is using the same amount of storage as M8, but spreading this across three search cluster servers to get more CPU power.It is most interesting to compare M10 to M4, as both setups try to achieve a combination of high crawl rate and query performance at the same time. In M4, this is done by splitting the application storage across two search rows, each with 3 columns with 18 SAS disk per server. M10 only has a single row and is replacing the application disk spindles with solid state storage. The search cluster totals are thus (both deployment alternatives have an additional administration server):

M4: 6 servers, 108 disk spindles M10: 3 servers, 3 solid state storage cards

With idle content crawls (solid lines), M4 achieves around 23 qps, before degrading to around 20 QPS under overload conditions, with IO becoming the bottleneck. M10 is able to deliver 30 QPS, at which point it becomes limited by the throughput CPU. Using faster or more CPUs would have increased this benefit even more than the measured 30% gain.During content crawling, M4 has no significant changes in query performance compared to idle. It is achieving crawl and query load separation by using an additional set of servers in a dedicated search row. M10 gets some degradation, as content processing and queries compete for the same CPU resources. Still, M10 achieves the same 20 QPS as M4 under the highest load conditions. Also note than the content crawling rate on M10 is 20% higher than on M4 during this test, as the increased IO performance allows for better handling of the concurrent operations.

59

Disk usageIndex disk usageThe table below shows the combined increase in disk usage on all nodes after the various content sources have been indexed.


FiXML data size


File share 1 (2 copies) 154 GB 18 GB 36 GB 5 GBFile share 2 (2 copies) 6.7 TB 360 GB 944 GB 10 GBSharePoint 1 2.0 TB 70 GB 220 GB 13 GBSharePoint 2 2.0 TB 66 GB 220 GB 17 GBHTML 1 8.8 GB 8 GB 20 GB 8 GBHTML 2 137 GB 31 GB 112 GB 6 GBTotal 11 TB 553 GB 1.6 TB 56 GB

Web Analyzer disk usageThe following table shows disk usage for the Web Analyzer in a mixed content scenario, where the data is both file share content and SharePoint items.Number of items in index 40,667,601Number of analyzed hyperlinks 119,672,298Average number of hyperlinks per items 2.52Peak disk usage during analysis (GB) 77.51Disk usage between analysis (GB) 23.13Disk usage per 1 million items during peak (GB) 1.63

Note:

The values in the table above are somewhat lower than the values specified for Web Analyzer performance dimensioning in the search overview earlier in this document. The values above derive from one specific installation where URLs are fairly short. The performance dimensioning recommendations are based on experience from several installations.

The average number of links per item is quite low compared to pure Web content installations, or pure SharePoint installations. For example, in a pure Web content installation the average number of links can be as high as 50. Since the Web Analyzer only stores document IDs, hyperlinks and anchor texts, the number of links is the dominant factor determining the disk usage.

60

Large FAST Search FarmThe large FAST Search Farm is targeting a moderate test corpus. The amount of content is up to 100 million items, and to meet freshness goals, incremental crawls are likely to occur during business hours.The configuration for the parent SharePoint farm uses two front-end Web servers, two application servers and one database server arranged as follows:

Two crawl components for the Content SSA are distributed across the two application servers. This is mainly due to I/O limitations in the test setup (1 Gbit/s network), where a single network adapter would have been a bottleneck.

One of the application servers also hosts Central Administration for the farm. One database server hosts the crawl databases, the FAST Search Server administration

databases, as well as the other SharePoint databases.Application servers and Web front end servers will only have disk space for operating system and programs. No separate data storage is required.

Deployment alternativesFor the large farm scenario, the following alternatives have been tested for the FAST Search Server farm back-end:

L1. Single row, six column setup, with an additional administration node (7 servers)L2. Same as L1, with the addition of a dedicated search row (13 servers)L3. Same as L2, but where the search row includes a backup indexer row (13 servers)


HardwareFAST Search Server farm serversAll the large size deployment alternatives are running on similar hardware. The following specifications have been used.

Windows Server 2008 R2 x64 Enterprise Edition 2x Intel L5520 CPUs



o OS: 2x 146GB 10k RPM SAS disks in RAID1o Application: 12x 146 GB 10k RPM SAS disks in RAID50 (two parity groups of 6

drives each). Total formatted capacity of 2 TB.o Disk controller: HP Smart Array P410, firmware 3.30

61

o Disks: HP DG0146FARVU, firmware HPD6




62

TopologyThis section describes the topology of the test environment.

L1L1 is a single row, six column setup with an additional administration node. The following deployment .xml file is used:<?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="L1" xmlns=”http://www.microsoft.com/enterprisesearch” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>L1</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=L1.jdbc]]> </connector-databaseconnectionstring>



<host name="fs4sp2.contoso.com"> <query /> <searchengine row="0" column="0" />



<document-processor processes="12" /> </host> <host name="fs4sp4.contoso.com"> <content-distributor /> <searchengine row="0" column="2" />




<document-processor processes="12" /> </host> <searchcluster> <row id="0" index="primary" search="true" /> </searchcluster>

</deployment>

63

L2L2 corresponds to L1 with the addition of a dedicated search row. The search row adds query throughput capacity, introduces query redundancy, and provides better separation of query and feeding load. Three servers running in the dedicated search row also includes a query processing component (query). The deployment also includes a query processing component on the administration node (fs4sp1.contoso.com). The Query SSA does not use this query processing component during normal operation, but may be used as a fallback to be able to serve queries if the entire search row is taken down for maintenance.<?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="L2" xmlns=”http://www.microsoft.com/enterprisesearch” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>L2</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=L2.jdbc]]> </connector-databaseconnectionstring>



<host name="fs4sp2.contoso.com"> <searchengine row="0" column="0" />







<document-processor processes="12" /> </host> <host name=" fs4sp8.contoso.com "> <query /> <searchengine row="1" column="0" />

64

</host> <host name="fs4sp9.contoso.com "> <query /> <searchengine row="1" column="1" /> </host> <host name="fs4sp10.contoso.com "> <query /> <searchengine row="1" column="2" /> </host> <host name="fs4sp11.contoso.com "> <searchengine row="1" column="3" /> </host> <host name="fs4sp12.contoso.com"> <searchengine row="1" column="4" /> </host> <host name="fs4sp13.contoso.com" <searchengine row="1" column="5" /> </host> <searchcluster> <row id="0" index="primary" search="true" />

<row id="1" index="none" search="true" /> </searchcluster>

</deployment>

L3L3 has the same setup as L2 with an additional backup indexer enabled on the search row. The backup indexer is deployed by modifying the L2 deployment .xml file as shown below.… <searchcluster> <row id="0" index="primary" search="true" /> <row id="1" index="secondary" search="true" /> </searchcluster>…

DatasetThis section describes the test farm dataset, including database content and sizes, search indexes, and external data sources.

Object ValueSearch index size (# of items) 103 millionSize of crawl database 358 GBSize of crawl database log file 65 GBSize of property database <0.1 GBSize of property database log file 0.6 GBSize of SSA administration database <0.1 GB

65

The table below specifies the content source types used to build the index. The numbers in the table reflect the total number of items per source, including replicated copies. The difference between the total number of items below and the index size above is due to two factors:



Content source items Raw data size Average size per item

File share 1 (4 copies) 2.4 M 308 GB128 kB

File share 2 (4 copies) 58.6 M 13.4 TB 229 kB

SharePoint 1 (4 copies) 18.1 M 8.0 TB 443 kB

SharePoint 2 (3 copies) 13.6 M 6.0 TB 443 kB

HTML 1 (3 copies) 3.2 M 26 GB 8.1 kB

HTML 2 (3 copies) 9.5 M 411 GB 43 kB

Total 105.5 M 28 TB 268 kB

Note:

To reach sufficient content volume in these tests, replicas of the data sources were added. Each copy of each document would appear as a unique item in the index, but treated as duplicates by the duplicate trimming feature. From a query matching perspective the load would be similar as having all unique documents indexed, but any results from these sources would trigger duplicate detection and collapsing in the search results.

66



Feed and indexing performanceAll the large scenario deployment alternatives were limited by the bandwidth of the content sources; L1 through L3 all achieved around 200 items per second feed rates.

Query performanceL1, L2 and L3 are scaled up versions of M1, M4 and M6 respectively. More columns are added to the farm to be able to index more content while maintaining the query performance. The following graph shows the query performance for L1 through L3, with and without ongoing crawls. L2 and L3 are giving roughly the same performance, and also the same performance as the corresponding smaller scale M4 and M6.

L1 shows a slightly different performance pattern compared to M1. Given 1 second maximum allowable latency, M1 achieved 27 and 16 QPS under respectively idle and running feed. The same numbers for L1 is 18 and 9 QPS. The additional columns in L1 compared to M1 infer that more servers are involved, where the slowest one for any given query at any given time will be the determining factor for the query latency. This effect is much less visible for the L2 and L3 deployments with dedicated search rows, as these machines do not have other components competing for resources.

67

Disk usageThe table below shows the combined disk usage on all nodes in the L1 deployment alternative. L2 and L3 uses further disk space for replication of FiXML and index files on the second row.


FiXML data size


Total 28 TB 1.1 TB 3.8 TB 104 GB

Extra-large FAST Search FarmThis scenario has not yet been tested. Planned capacity is 500 million items per farm.

Overall Takeaways Query and feeding performancePerformance for feeding of new content is mainly determined by the item processing capacity. It is therefore important that you deploy the item processing component in a way that utilizes spare CPU capacity across all servers.Running indexer, item processing and query matching on the same server will give high resource utilization, but also higher variations in query performance during crawling. For such a deployment it is recommended to schedule all crawling outside periods with high query load. A separate search row is recommended for deployments where low query latency is required at any time.You can also combine a separate search row with a backup indexer. This will provide short recovery time in case of a non-recoverable disk error, with some loss of query performance and incremental update rates. For the highest query performance requirements, a pure search row is recommended.

RedundancyThe storage subsystem for a farm must have some level of redundancy, as loss of storage even in a redundant setup will lead to reduced performance during a recovery period that can last for days. Using a RAID disk set, preferably also with hot spares, is essential to any install.A separate search row will also provide query redundancy.Full redundancy for the feeding and indexing chain requires a backup indexer on a separate row, with increased server count and storage volume. While this provides the quickest recovery path from hardware failures, other options might be more attractive when hardware outages are infrequent:

Running full re-crawl of all the content sources after recovery. Depending on deployment alternative this may take several days. If you have a separate search row you can perform the re-crawl while keeping the old index searchable.

Run regular backup of the index data.

Capacity per nodeFor deployments with up to 15 million items per node you should use the default configuration.

68

Configuration for extended content capacity can be used for up to 40 million items per node if you have moderate query performance requirements. Given sufficient storage capacity on the servers, this will enable a substantial cut in number of servers deployed.

Deployments on storage area networks (SAN)FAST Search Server can use SAN storage instead of local disks if this is required for operational reasons. The requirement for high performance storage still applies. Testing of the M2 deployment alternative shows that a sufficiently powerful SAN will not be a bottleneck. Although the actual workload is scenario dependent, the following parameters could be used as estimation for the required SAN resources for each node in the FAST Search Server farm:

2000 – 3000 I/O operations per second (IOPS) 50 – 100 kB average block size Less than 10 ms average read latency

For example, for a farm setup like M4 (7 servers), the SAN must be capable of serving 15.000 – 20.000 IOPS to the FAST Search Server farm regardless of any other traffic served by the same storage system.

Deployments on solid state disks (SSD) FAST Search Server can take advantage of the increased IO performance of SSDs. The XS5 deployment alternative had low content volume and high QPS. With regular spindles, a high number of disks would be needed, while only two SSDs were sufficient. This allows for using blade servers with local disks on the blade itself, and still having sufficient IO performance.The M8 and M10 deployment alternatives used SSDs with even higher storage capacity and IO performance. Both of these configurations were entirely limited by the CPU performance. The SSDs made it feasible to have high query performance without a dedicated search row, thus roughly halving the server count needed for a certain performance target1. The reduced disk count per server also yields lower power consumption.

1 Removing the search row does although remove redundancy in case of server failures.69

Troubleshooting performance and scalabilityThis section provides recommendations for how to optimize the capacity and performance of your system environment.It also covers troubleshooting tips for the FAST Search Server farm servers, and the FAST Search Server specific configuration settings found in the Query and Content SSAs.

Raw I/O performanceFAST Search Server has extensive use of the storage subsystem. Testing the raw I/O performance can be used as an early verification of having sufficient performance.One such test tool is SQLIO (http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65-cb53442d9e19). After installing SQLIO, the first step is to get or generate a suitable test file. The tests below include write operations, thus the content of this file will be partially overwritten. The size of the file should also be much larger than the available system memory (by a factor of 10) to avoid most caching effects.The test file can also be generated by SQLIO itself, although not directly for huge file sizes. It is recommended to generate a 1 GB file with the command "sqlio.exe -t32 -s1 -b256 1g" which will create the file named "1g" in the current directory. This file can then be concatenated to a sufficiently large file like 256GB, by the command "copy 1g+1g+1g+…..+1g testfile". To ensure that caching during the test file preparation do not skew the results, a server reboot is recommended before continuing with the specified tests.The following set of commands is representative for the most performance critical disk operations in FAST Search Server. All assume that a file "testfile" exists in the current directory, which should be located on the disk planned to host FAST Search Server. Each test runs for 300 seconds:sqlio.exe -kR -t4 -o25 -b1 -frandom -s300 testfilesqlio.exe -kR -t4 -o25 -b32 -frandom -s300 testfilesqlio.exe -kW -t4 -o25 -b32 -frandom -s300 testfilesqlio.exe -kR -t1 -o1 -b100000 -frandom -s300 testfilesqlio.exe -kW -t1 -o1 -b100000 -frandom -s300 testfile

The first test measures the maximum number of I/O operations per second for small read transfers. The second and third tests measure the performance for medium sized random accesses. The two last tests measures read and write throughput for large transfers. Some example results are given in the following table, with minimum recommendations during normal operation in the topmost row.

70

http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65-cb53442d9e19

http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65-cb53442d9e19

Disk layout1kBread

[IOPS]

32kBread

[IOPS]

32kBwrite[IOPS]

100MBread

[MB/s]

100MBwrite

[MB/s]Recommended minimum 2000 1800 900 500 250

16x SAS 10k RPM 2.5" drivesRAID50 in two parity groups 2952 2342 959 568 277

22x SAS 10k RPM 2.5" drivesRAID50 in two parity groups 4326 3587 1638 1359 266

With drive failure 3144 2588 1155 770 25712x SAS 7200 RPM 3.5" drivesRAID50 in two parity groups 1844 1315 518 677 780

With drive failure 1424 982 531 220 47712x SAS 7200 RPM 3.5" drives

RAID10 1682 1134 1169 762 692

With drive failure 1431 925 1154 213 22012x SAS 15k RPM 3.5” drivesRAID50 in two parity groups 4533 3665 848 501 235

2x ZeusIOPS 400GB MLC 2.5” drivesRAID0 52709 14253 27172 360 122

1x ioDrive 640GB MLC 83545 21875 17687 676 5331x ioDrive Duo 1280GB MLC 160663 42647 32574 1309 664

3x ioDrive Duo 1280GB MLC RAID0 162317 83661 44420 2382 14123x ioDrive Duo 1280GB MLC RAID0Non-default option: 4kB block size 1815933 86396 47423 2340 1631

3x ioDrive Duo 1280GB MLC RAID5 188284 87270 11800 2459 545With card failure 126469 48564 10961 716 202

Note:

The numbers in the table reflects a deployment where the disk subsystem is at least 50% utilized in capacity before adding the test file. Testing on empty disks tends to get elevated results, as the test file is then placed in the most optimal tracks across all spindles (short-stroking), which can give 2-3x higher performance.Numbers in rows highlighted in red are measured with a forced drive failure.

RAID50 provides better performance during normal operation than RAID10 for most tests apart from small writes. RAID10 has less performance degradation if a drive should fail. We recommend using RAID50 for most deployments, as 32kB writes is the least critical of the five tests indicated in the table above. RAID50 provides near twice the storage capacity compared to RAID10 on the same number of disks.

2 This is the average IOPS over the standard 300 second test period. It although starts out as ~3500 IOPS, degrading to a sustained ~1700 IOPS after 3-4 minutes.3 Tested with 4kB block reads due to the different block size formatting

71

If you deploy a backup indexer, 32kB writes are more frequent. This is due to the fact that a large amount of pre-index storage files (FiXML) are passed from the primary to the backup indexer node. This may in certain cases lead to a performance improvement by using RAID10.

Note:

These results are to a large degree dependent on the disk controller and spindles used. All scenarios in this document specify in detail the actual hardware that has been tested.

Analyzing feeding and indexing performance

The content processing chain in FAST Search Server consists of the following components, all potentially running on separate nodes:

Crawler(s): Any node pushing content into FAST Search Server, in most cases a Content SSA hosted in a SharePoint 2010 farm.

Content distributor(s): Receives content in batches and redistribute them to item processing in document processors

Item processing: Converts documents to a unified internal format Indexing dispatcher(s): Schedules a indexer node for each content batch Primary indexer: Generates the index Backup indexer: Persists a backup of the information in the primary indexer

Content flows as indicated by arrows 1–5 in the figure above, with the last flow from primary to backup indexer is an optional deployment choice. Asynchronous callbacks for completed processing are propagating in the other direction as indicated by arrows 6 through 9. Crawlers will be throttling the feed rate based on the callbacks (9) received for document batches (1). The overall feed performance will be determined by the slowest component in this chain. The following sections will describe how to monitor this.Monitoring can be done through several tools; for example the "Performance monitor" of Windows Server 2008 [R2], or on Systems Center Operations Manager (SCOM).

Content SSAThe most frequently used crawler is the set of indexing connectors supported by the Content SSA. The following statistics are important:

Batches ready: The number of batches that has been retrieved from the content sources, and that are ready for passing on to the content distributor.

72

Batches submitted: The number of batches that has been sent to FAST Search Server, and for which a callback is still pending.

Batches open: The total number of batches in some stage of processing.The figure below shows these performance counters for a crawl session. Note that there is different scale used in "batches submitted" and the other two. Feed starts with "batches submitted" ramping up until the item processing components are all busy (36 in this case), and will stay at this level as long as there are available work ("batches ready"). There is a period around 6:45 to 8:45 where the content source is only able to provide very limited volumes of data, bringing "batches ready" to near zero in the same period.

For deployments with backup indexer rows, the "batches submitted" tend to exceed the number of item processing components. These "additional" batches are content that has been processed, but which has not yet been persisted in both indexer rows. The Content SSA will by default throttle feeds in order to avoid more than 100 "batches submitted".For large installations, the throttling parameters should be adjusted to allow for more batches to be in some stage of processing. Tuning is only needed for deployments with at least one of the following characteristics:

More than 100 item processing component instances deployed per crawl component in the content SSA

More than 50 item processing component instances deployed per crawl component in the content SSA, in conjunction with a backup indexer row

More than 3 index columns per crawl component in the content SSAThe number of crawl components within the content SSA must be dimensioned properly for large deployments to avoid network bottlenecks. This scaling will often eliminate the need for further configuration tuning. When one or more of the above mentioned conditions apply, the feeding

73

performance can be improved by increasing throttling limits in the Content SSA. These properties are "MaxSubmittedBatches" (default 100) and "MaxSubmittedPUDocs" (default 1000), and increased limits can be calculated as given below.

Note:

These limits apply for each crawl component within the Content SSA. If you use two crawl components (as in some of the scenario tests), the maximum total number of batches submitted will be two times the configured value.

a={1 for deploymentswithout backup indexer2 for deploymentswithbackup indexer

b=a∗Number of item processor instances

c=Number of index columns

s=Number of crawlcomponents∈theContent SSA

MaxSubmittedBatches=20∗c+bs

MaxSubmittedPUBatches=100∗MaxSubmittedBatches

For example, the M4 scenario will have a=1, b=48, c=3, s=2; resulting in MaxSubmittedBatches = 54 and MaxSubmittedPUDocs = 5400. The default value (100) for MaxSubmittedBatches does not need tuning in this case. MaxSubmittedPUDocs (the maximum number of documents with ACL changes submitted) may be increased if the feed performance is limited by a high rate of ACL changes. These configuration parameters have not been changed in any of the scenarios covered in this document.These throttling limits are configurable through the SharePoint 2010 Management Shell on the SharePoint farm hosting the Content SSA. The following commands set the default values:$ssa = Get-SPEnterpriseSearchServiceApplication -Identity "My Content SSA"$ssa.ExtendedConnectorProperties["MaxSubmittedBatches"] = 100$ssa.ExtendedConnectorProperties["MaxSubmittedPUDocs"] = 1000$ssa.Update()

You need to replace the identity string "My Content SSA" with the name of your Content SSA.Increasing these limits will increase the load on the item processing and indexing components. When these consume more of the farm resources, query performance will be impacted. This is less of an issue when running without a dedicated search row. Increasing "MaxSubmittedPUDocs" will increase the I/O load on primary and backup indexers.The table below shows the most important performance counters for the Content SSA. Note that these are found on the node(s) hosting the Content SSA crawl components, under "OSS Search FAST Content Plugin", and not in the FAST Search Server farm.Performance counter

Apply to object

Notes

74

Content distributors and item processingEach FAST Search Server farm has one or more content distributors. These components receive all content in batches, which are passed on to the item processing components. You can ensure good performance by verifying that the following conditions are met:

Item processing components are effectively utilized Incoming content batches are rapidly distributed for processing

Maximum throughput can only be achieved when the Content SSA described in the previous section has a constant queue of "bathes ready" that can be submitted. Each item processing component will use 100% of a CPU core when busy. Item processing components can be scaled up to one per CPU core.When having multiple content distributors, the below performance counters should be summed up across all of them for a total overview of the system.Performance counter

Apply to object

Notes

Document processors

FAST Search Content Distributor

The number of item processing components registered with each content distributor. When having multiple content distributors, the item processing components will be evenly distributed across the content distributors.

Document processors busy


The number of item processing components that are currently working on a content batch. This should be close to the total number of item processing components under maximum load.

Average dispatch time


The time needed for the content distributor to send a batch to an item processing component. This should be less than 10ms. Higher values indicate a congested network.

Average processing time


The time needed for a batch to go through an item processing component. This time can vary depending on content types and batch sizes, but would normally be less than 60 seconds.

Available Mbytes

Memory The total amount of available memory on the computer. Each item processing component might need up to 2GB of memory. Processing throughput will be impacted under memory starvation.

Processor time

Processor Overall CPU usage on the computer. Item processing components are very CPU intensive. High CPU utilization is expected during crawls, but item processing is scheduled with reduced priority and will yield CPU resources to other components when needed.

Bytes Total/sec

Network Interface

Overall network usage on the computer. High network load might become a bottleneck for the rate of data that can be processed by the FAST Search Server nodes.

75

Indexing dispatcher and indexersIndexers are the most write intensive component in a FAST Search Server installation, and you need to ensure that you high disk performance. High indexing activity can also affect query matching operations when running on the same row.Indexers distribute the items across several partitions. Partition 0, and up to three of the other partitions can have ongoing activity at the same time. During redistribution of items among partitions, one or more partitions might be in a state waiting for other partitions to reach a specific checkpoint. In addition to the performance counters below, indexer status is provided by the "indexerinfo" command, for example "indexerinfo –a status".Performance counter

Apply to object

Notes

Current queue size

FAST Search Indexer Status

Indexers queues incoming work under high load. This is normal, especially for partial updates. If API queues never (intermittently) reaches zero, the indexer is the bottleneck. Feeds will be paused when the queue reaches 256MB in one of the indexers.This can happen if the storage subsystem is not sufficiently powerful. It will also happen during large redistribution of content between partitions, which temporarily blocks more content from being indexed.

FiXML fill rate

FAST Search Indexer

FiXML files are compacted at regular intervals, by default between 3am and 5am every night. Low FiXML fill rate (<70%) will lead to inefficient operation.

Active documents

FAST Search Indexer Partition

Partitions 0 and 1 should have less than 1 million items each, preferably even less in order to keep indexing latency low. In periods with high item throughput, indexing latency will be reduced and these partitions will be larger, as this is more optimal for overall throughput. Items will although automatically be rearranged into the higher numbered partitions during periods with lighter load.

% Idle Time Logical disk

Low disk idle time suggest a saturated storage subsystem.

% Free space

Logical disk

Indexers need space for both the index generation currently used for search, as well as new index generations that are under processing. On a fully loaded system, disk usage will vary between 40% and near 100% for the same number of items, depending on the state of the indexer.

76

Analyzing query performanceQuery SSASharePoint administrative reports provide useful statistics for query performance from an end-to-end perspective. These reports are effective for tracing trends over time, as well as identifying where to investigate when performance is not optimal.The diagram below shows two such events. Around 2:20am, server rendering (blue graph) has a short spike due to recycling of the application pool. Later, at 3:00am, the FiXML compaction is starting, impacting the backend latency.

In general, server rendering and object model latencies occurs on the nodes running SharePoint. These latencies are also dependent on the performance of the SQL server(s) backing the SharePoint installation. The backend latency is within the FAST Search Server nodes, and will be discussed in the following sections.

QRproxy and QRserverQueries are sent from the Query SSA to the FAST Search Server farm via the QRproxy component which resides on the server running the query processing component ("query" in the deployment file). The performance counters in the table below can be helpful for correlating the backend latency reported by the Query SSA, and the query matching component (named "QRServer" in the reports). Neither of these components is likely to represent a bottleneck. Any difference between the two is due to communication delays or processing in the QRproxy.

77

Performance counter Apply to object

Notes

# Queries/sec FAST Search QRServer

Current number of queries per second

# Requests/sec FAST Search QRServer

Current number of requests per second. In addition to the query load, one internal request is received every second to check that QRserver is alive.

Average queries per minute

FAST Search QRServer

Average query load

Average latency last - ms

FAST Search QRServer

Average query latency

Peak queries per sec FAST Search QRServer

Peek query load seen by the QRserver since last restart

Query dispatcherThe query dispatcher (named "Fdispatch" in the reports) distributes queries across index columns. There is also a query dispatcher located on each query matching node, distributing queries across index partitions. Both query dispatchers may be a bottleneck when there are huge amounts of data in the query results, leading to network saturation. It is recommended to keep traffic in and out of fdispatch on network connections that are not carrying heavy load from e.g. content crawls.

78

Query matchingThe query matching (component named "Fsearch" in the reports) is responsible for performing the actual matching of queries against the index, computing query relevancy and performing deep refinement. For each query, it reads the required information from the indices generated by the indexer. Information that is likely to be reused will be kept in a memory cache. Good query matching performance is relying on a powerful CPU as well as low latency from small random disk reads (typically 16-64 kB). The below performance counters are useful for analyzing a node running the query matching:Performance counter Apply to

objectNotes

% Idle Time Logical disk Low disk idle time suggest a saturated storage subsystem

Avg. Disk sec/Read Physical disk Each query will need a series of disk reads. An average read latency of less than 10 ms is desirable.

Avg. Disk Read Queue Length

Physical disk On a saturated disk subsystem, read queues will build up. Queues will affect query latency. An average queue length smaller than 1 is desirable for any node running query components. This will typically be exceeded in single row deployments during indexing, negatively impacting search performance.

Processor time Processor CPU utilization is likely to become the bottleneck for high query throughput. When query matching has high processor time (near 100%), query throughput will not be able to increase further.

79

introduction - justportal.just.ro/test doclib... · web viewfast search server 2010 for sharepoint...

Documents