working with large lists

Upload: chaim-farber

Post on 09-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Working with large lists

    1/22

    White Paper: Working withlarge lists in OfficeSharePoint Server 2007

    Author:

    Steve Peschka

    Date published:

    August 2007

    Summary:

    Microsoft performed performance testing against Microsoft Office SharePoint Server 2007 to

    determine the performance characteristics of large SharePoint lists under different loads and

    modes of operation. This white paper presents their findings.

  • 8/8/2019 Working with large lists

    2/22

    The information contained in this document represents the current view of Microsoft Corporation

    on the issues discussed as of the date of publication. Because Microsoft must respond to

    changing market conditions, it should not be interpreted to be a commitment on the part of

    Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the

    date of publication.

    This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES,

    EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

    Complying with all applicable copyright laws is the responsibility of the user. Without limiting the

    rights under copyright, no part of this document may be reproduced, stored in or introduced into a

    retrieval system, or transmitted in any form or by any means (electronic, mechanical,

    photocopying, recording, or otherwise), or for any purpose, without the express written permission

    of Microsoft Corporation.

    Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual

    property rights covering subject matter in this document. Except as expressly provided in any

    written license agreement from Microsoft, the furnishing of this document does not give you any

    license to these patents, trademarks, copyrights, or other intellectual property.

    2007 Microsoft Corporation. All rights reserved.

    Microsoft, SQL Server, Windows, SharePoint, and Active Directory are either registered

    trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

    The names of actual companies and products mentioned herein may be the trademarks of theirrespective owners.

  • 8/8/2019 Working with large lists

    3/22

    Table of Contents

  • 8/8/2019 Working with large lists

    4/22

    GoalsThe test results in this white paper are intended to demonstrate the difference in the performance

    characteristics of SharePoint lists containing large numbers of items when different data access

    types are used to present list contents. Test results in this white paper show how to optimize listperformance through limits on the number of items that appear in a list, and by choosing the most

    appropriate method of retrieving list contents.

    The tests upon which the results in this white paper are based were conducted by using artificially

    created test data and simulated users. Real-world results may vary depending on hardware,

    number of concurrent users, farm configuration, and user operations being performed.

    Test results and findingsThere is documented guidance for Microsoft Office SharePoint Server 2007 regarding the

    maximum size of lists and list containers. For typical customer scenarios in which the standard

    Office SharePoint Server 2007 browser-based user interface is used, the recommendation is that

    a single list should not have more than 2,000 items per list container. A container in this casemeans the root of the list, as well as any folders in the list a folder is a container because other

    list items are stored within it. A folder can contain items from the list as well as other folders, and

    each subfolder can contain more of each, and so on. For example, that means that you could

    have a list with 1,990 items in the root of the site, 10 folders that each contain 2,000 items, and so

    on. The maximum number of items supported in a list with recursive folders is 5 million items.

    In Office SharePoint Server 2007, virtually all end-user data is stored in a list. A document library,

    for example, is just a specialized list. The same is true for calendars, contacts, and other

    interfaces; they are all just customized versions of the basic SharePoint list, also referred to as an

    SPList. The individual items in the list are referred to as list items generally, or an SPListItem in

    an SPListItemCollection in the Office SharePoint Server 2007 object model. The findings in this

    article are equally important across all of the ways in which you store and work with data in a

    Office SharePoint Server 2007 site.

    There are some scenarios in which you want to take advantage of the features of Office

    SharePoint Server 2007, but need to exceed the limit of 2,000 items per container. If you write

    your own interface for managing and retrieving the data, its quite possible that you can go past

    this limit without an adverse impact on farm performance. You may be able to manage larger lists

    to some extent by using views within Office SharePoint Server 2007 that are filtered such that

    there are never more than 2,000 items returned. Filtered views provide better performance than

    just trying to view one large flat list, but are not as efficient as breaking down the list into different

    containers if you are using the predefined browser-based Office SharePoint Server 2007

    interface.

    If you develop your own interface, there are several different ways to retrieve list data, each with

    different performance characteristics. Some data access methods perform very well, but are only

    useful in a limited number of scenarios. Finally, there are also performance tradeoffs that need to

    be made with other data maintenance tasks in addition to data retrieval.

    Test characteristicsThe tests in this white paper were conducted on a relatively underpowered Microsoft Virtual

    Server 2005 R2 image to show a comparison of farm performance characteristics when different

    data access types are used to manipulate list data. The goal of these tests was not to establish a

    new arbitrary limit, or to deliver a requests per second type number that is typically used in a

    load style test to show raw throughput capacity. The virtual server image was running Office

  • 8/8/2019 Working with large lists

    5/22

    SharePoint Server 2007 Enterprise Edition and had 1 gigabyte (GB) of allocated RAM. Virtual

    Server was running on a host machine with a 2 gigahertz (GHz) dual-core processor and 2 GB of

    RAM.

    Baseline tests were done first with a list containing 1,500 items. The list schema looked like this:

    Title: Single line of text

    Expense Category: Choice (Meals, Travel, Hotel, Supplies)

    Amount: Currency

    Deductible: Yes/No

    Created By: Person or Group

    Modified By: Person or Group

    In the baseline tests, no columns were indexed; measurements were taken just to provide a

    relative value that could be used after the number of items in the list exceeded recommended

    boundaries. In the tests against a very large list, one set was done with no columns being

    indexed and a second round was done after configuring the Expense Category column to be

    indexed. The query that was executed in each one of the tests used a WHERE clause against the

    Expense Category field looking for the first 100 items that contained Supplies.To provide another point of comparison, the data being selected was based on ID value in the

    tests against the very large list. The ID is a built-in numeric indexed field in all SharePoint lists

    that is well suited to queries. The query in this case was constructed with a WHERE clause that

    retrieved items where the ID ranged from 44,500 through 44,599.

    Some tests were also run with the site under load. To create the load during the testing process,

    a LoadTest was created in the Microsoft Visual Studio .NET 2005 development system to stress

    test the site. Instead of targeting a specific number of users in the test, it was configured as a

    goal-based test, or a test in which a target value is defined for a particular measurement, and the

    test determines the number of requests required to achieve the target. In this case, the goal that

    was configured for the test was to achieve a consistent target CPU utilization on the Office

    SharePoint Server 2007 computer of from 60 through 80 percent.

    Data access methodsEach test consisted of retrieving a subset of data from the list using one of a number of different

    data access methods. This section shows the different methods that were tested.

    Note: The code samples included in the following sections are intended to show the process

    used to conduct tests. The code may not comply with coding best practices, and should not be

    used in a production environment without careful review and testing.

    BrowserThe list was viewed using a browser and the predefined Office SharePoint Server 2007 interface.

    A special tool, which is described in the Test Harness section later in this white paper, was

    developed to accurately capture how long it takes to view that information and browse throughpages of data.

    SPList with For/EachThe Office SharePoint Server 2007 object model (OM) was used to retrieve the list into an SPList

    object. Each item in the list was then enumerated with a For/Each loop until items were found

    that matched the search criteria.

    The following sample code was used for this method.

  • 8/8/2019 Working with large lists

    6/22

    'get the site

    Dim curSite As SPSite = New SPSite("http://myPortal")

    'get the web

    Dim curWeb As SPWeb = curSite.OpenWeb()

    'get our list

    Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))

    'get the collection of items in the list

    Dim curItems As SPListItemCollection = curList.Items

    'enumerate the items in the list

    ForEach curItem As SPListItem In curItems'do some comparison in here to see if it's an item we need

    Next

    SPList with SPQueryThe OM was used to create an SPQuery object that contained the query criteria. That object was

    then used to against an instance of the list in a SPList object. The results of the query were

    returned by calling the GetItems method on the SPList object.

    The following sample code was used for this method.

    'get the siteDim curSite As SPSite = New SPSite("http://myPortal")

    'get the web

    Dim curWeb As SPWeb = curSite.OpenWeb()

    'create our query

    Dim curQry As SPQuery = New SPQuery()

    'configure the query

    curQry.Query = "

    Hotel"

    curQry.RowLimit = 100

    'get our list

    Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))

  • 8/8/2019 Working with large lists

    7/22

    'get the collection of items in the list

    Dim curItems As SPListItemCollection = curList.GetItems(curQry)

    'enumerate the items in the list

    ForEach curItem As SPListItem In curItems

    'do something with each match

    Next

    SPList with DataTableThis is one of two methods that test using a Microsoft ADO.NET DataTable to work with the data.

    In this case an instance of the list is obtained with an SPList object. The data from it is then

    retrieved into a DataTable by calling the GetDataTable() method on the Items property for

    example, SPList.Items.GetDataTable(). The DataTables DefaultView has a property called

    RowFilterthat was then set to find the items. To keep the methodology between data access

    methods consistent, the DataTable was not cached between tests it was filled each time by

    calling the GetDataTable() method. In a real-world scenario this test would have performed betterhad the DataTable been cached after the data was first retrieved, but it serves as a valuable point

    in comparison testing about the cost of this approach versus retrieving a DataTable from a

    selection of data thats already filtered.

    The following sample code was used for this method.

    'get the site

    Dim curSite As SPSite = New SPSite("http://myPortal")

    'get the web

    Dim curWeb As SPWeb = curSite.OpenWeb()

    'get our list

    Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))

    'get the item in a datatable

    Dim dt As DataTable = curList.Items.GetDataTable()

    'get a dataview for filtering

    Dim dv As DataView = dt.DefaultView

    dv.RowFilter = "Expense_x0020_Category='Hotel'"

    'enumerate matches

    For rowNum AsInteger = 0 To dv.Count - 1

    'do something with each match

    Next

  • 8/8/2019 Working with large lists

    8/22

    SPListItems with DataTableThis method is similar to the SPList with DataTable method, but with a twist. An instance of the

    list is retrieved through an SPList object. An SPQuery object is created to build a query, and that

    query is executed against the SPList object, which returns an SPListItems collection. The data

    from that collection is then retrieved into a DataTable by using the GetDataTable() method on the

    SPListItems collection.The following sample code was used for this method.

    'get the site

    Dim curSite As SPSite = New SPSite("http://myPortal")

    'get the web

    Dim curWeb As SPWeb = curSite.OpenWeb()

    'create our query

    Dim curQry As SPQuery = New SPQuery()

    'configure the query

    curQry.Query = "Hotel"

    curQry.RowLimit = 100

    'get our list

    Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))

    'get the collection of items in the list

    Dim curItems As SPListItemCollection = curList.GetItems(curQry)

    'get the item in a datatable

    Dim dt As DataTable = curItems.GetDataTable()

    'enumerate matches

    ForEach dr As DataRow In dt.Rows

    'do something with each matchNext

    Lists Web serviceThe Lists Web service, which comes with Windows SharePoint Services 3.0 and Office

    SharePoint Server 2007, was used to retrieve the data. A Collaborative Application Markup

    Language (CAML) query was created and submitted along with the list identifier, and an XML

    result set was returned from the Lists Web service.

  • 8/8/2019 Working with large lists

    9/22

    The following sample code was used for this method.

    'create a new xml doc we can use to create query nodes

    Dim xDoc AsNew XmlDocument

    'create our query node

    Dim xQry As XmlNode = xDoc.CreateNode(XmlNodeType.Element,

    "Query", "")

    'set the query constraints

    xQry.InnerXml = "Hotel"

    'create the Web service proxy that is mapped to Lists.asmx

    Using ws AsNew wsLists.Lists()

    'configure it

    ws.Credentials =

    System.Net.CredentialCache.DefaultCredentials

    ws.Url = "http://myPortal/_vti_bin/lists.asmx"

    'create the optional elements

    Dim xView As XmlNode = xDoc.CreateNode(XmlNodeType.Element,

    "ViewFields", "")

    Dim xQryOpt As XmlNode =

    xDoc.CreateNode(XmlNodeType.Element, "QueryOptions", "")

    'query the server

    Dim xNode As XmlNode = ws.GetListItems("myListID", "",

    xQry, xView, "", xQryOpt, "")

    'enumerate returned items

    For nodeCount AsInteger = 0 To xNode.ChildNodes.Count - 1

    'do something with each match

    Next

    EndUsing

    SearchThe OM was used to execute a query against the Office SharePoint Server 2007 search engine

    and return the results as a ResultTableCollection. That was then further distilled down into an

  • 8/8/2019 Working with large lists

    10/22

    ADO.NET DataTable via the ResultTable of ResultType.RelevantResults from the

    ResultTableCollection.

    The following sample code was used for this method.

    'get the site

    Dim curSite As SPSite = New SPSite("http://myPortal")

    'get the web

    Dim curWeb As SPWeb = curSite.OpenWeb()

    'get our list

    Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))

    Dim qry AsNew FullTextSqlQuery(curSite)

    Dim SQL AsString = "SELECT Title, Rank, Size, Description,Write, Path, Deductible, ExpenseCategory, ID, Vendor, Amount FROM

    portal..scope() WHERE CONTAINS

    (""URL"",'""#SITEURL#Lists/#LISTURL#*""') #DEFAULT# ORDER BY

    ""Rank"""

    'do token replacement

    SQL = SQL.Replace("#SITEURL#", "http://myPortal/")

    SQL = SQL.Replace("#LISTURL#", curList.Title)

    SQL = SQL.Replace("#DEFAULT#", "AND FREETEXT

    (""ExpenseCategory"",'""Hotel""')")

    qry.QueryText = SQL

    qry.RowLimit = 100

    qry.ResultTypes = ResultType.RelevantResults

    'execute the query

    Dim rtc As ResultTableCollection = qry.Execute()

    Dim rt As ResultTable = rtc(ResultType.RelevantResults)

    Dim dt AsNew DataTable()

    dt.Load(rt, LoadOption.OverwriteChanges)

    'enumerate matches

    ForEach dr As DataRow In dt.Rows

    'do something with each match

    Next

  • 8/8/2019 Working with large lists

    11/22

    PortalSiteMapProviderOne approach to retrieving list data in Office SharePoint Server 2007 thats not very well known is

    the use of the PortalSiteMapProviderclass. It was originally created to help cache content for

    navigation. However, it also provides a nice automatic caching infrastructure for retrieving list

    data. The class includes a method called GetCachedListItemsByQuery that was used in this

    test. This method first retrieves data from a list based on an SPQuery object that is provided as aparameter to the method call. The method then looks in its cache to see if the items already exist.

    If they do, the method returns the cached results, and if not, it queries the list, stores the results in

    cache and returns them from the method call.

    The following sample code was used for this method. Note that it is different from all of the

    previous examples in that you cannot use the PortalSiteMapProviderclass in Windows forms

    applications.

    'get the current web

    Dim curWeb As SPWeb = SPControl.GetContextWeb(HttpContext.Current)

    'create the query

    Dim curQry AsNew SPQuery()

    curQry.Query = "Hotel"

    'get the portal map provider stuff

    Dim ps As PortalSiteMapProvider =

    PortalSiteMapProvider.WebSiteMapProvider

    Dim pNode As PortalWebSiteMapNode =TryCast(ps.FindSiteMapNode(curWeb.ServerRelativeUrl),

    PortalWebSiteMapNode)

    'get the items

    pItems = ps.GetCachedListItemsByQuery(pNode, "myListName_NotID",

    curQry, curWeb)

    'enumerate all matches

    ForEach pItem As PortalListItemSiteMapNode In pItems

    'do something with each matchNext

    Test harnessAll of the tests were executed through one of three different test harnesses. Each one is

    described in more detail below.

  • 8/8/2019 Working with large lists

    12/22

    WinForm test applicationThe WinForm test application was used for the majority of the tests. It was written in the Microsoft

    Visual Basic.NET development system, and runs on the Office SharePoint Server 2007 computer

    itself so that it can use the OM to retrieve data from Office SharePoint Server 2007. It used the

    new StopWatch feature of the Microsoft.NET Framework version 2.0 to capture the elapsed

    milliseconds that each test took to complete both retrieving the data and enumerating the results.The test results were enumerated and the values of two fields of data were retrieved from each

    item so that if any data access method caused some additional processing time in the retrieval of

    those items, it would get recorded along with the results. This was done to give a more realistic

    representation of how the data would be used in a real-world scenario.

    WebPart and JavaScriptMonitoring the time it takes for the predefined Office SharePoint Server 2007 browser interface to

    render a page was more difficult. In order to capture that information a custom ASP.NET server

    control was developed. In the OnInit event for the Web Part, the current time down to the

    millisecond is recorded. When Render is called, that time is output along with some JavaScript

    onto the page. The JavaScript forces a call when the browser documents ReadyStateChange

    event fires to a function that the Web Part creates. That function checks the documentsreadyState property and if it is Complete, the function gets the current time, subtracts the time

    that was captured during the Web Parts OnInit event, and displays the difference. The value that

    is displayed represents how long it took from when the Web Part was first initialized until the page

    was completely finished loading.

    Web PartA second Web Part was written to use the PortalSiteMapProvider application programming

    interface (API). This Web Part requires a valid HTTP context and so it would not work in the

    WinForms test harness. The process it used was very similar to the WinForms application,

    however in the Rendermethod it calls the GetCachedListItemsByQueryon the

    PortalSiteMapProviderclass instance and uses the StopWatch class to track the elapsed

    milliseconds, which it outputs to the page.

    Test resultsBefore reviewing each of the data points in the testing process its also important to understand

    what each data point represents. Each point on the graph is represents the average of a number

    of tests. For example, most of the test results consist of five data points. Each data point

    represents the average time for five tests, so all five data points are the result of 25 tests. The

    only exception is the tests for the browser-based rendering times they used a smaller dataset

    than the other tests. The following sections describe the individual test results. All timed results

    are measured in milliseconds, so smaller numbers are better.

  • 8/8/2019 Working with large lists

    13/22

    Browser-based viewing and page sizeOne test that was done was to determine how the number of records displayed for a list on the

    page impacts the performance of rendering that page. The goal was to understand if showing

    more items on page caused linear growth, or response times that got exponentially worse. The

    testing was done against a list with 1,500 items and varied the number of items displayed on apage to be 100, 300 and 500. As shown in the following graph, increasing the number of items

    displayed per page results in a fairly linear increase in display time.

  • 8/8/2019 Working with large lists

    14/22

    The baseline testThe goal for the next set of tests was to establish our baseline numbers. Here are the results of

    the different data access methods against a list with 1,500 items. Only the most common data

    access methods were included in the baseline testing, so test results for the

    PortalSiteMapProviderclass were not included.

    What stands out clearly in this set of results is that viewing the data using the predefined Office

    SharePoint Server 2007 browser interface is the slowest data access method by far. This is one

    of the reasons why guidance has been delivered to restrict list sizes to no more than 2,000 items

    per container. Its also why we recommend that you dont consider going above the 2,000 items

    per container unless you are developing an alternative interface to work with the data.

  • 8/8/2019 Working with large lists

    15/22

    Testing with a very large listThe next test really shows well what happens when you dramatically increase the number of

    items in the list over the recommended guideline. In this case, the list contained 100,000 items.

    The list did not have the index on the Expense Category column, and the site was under load.

  • 8/8/2019 Working with large lists

    16/22

    The following version of the previous chart omits the two slowest data retrieval methods for ease

    of comparison between the other methods.

    Using the For/Each enumeration to find items within the list is clearly not a good choice forworking with large amounts of data. In addition, there was tremendous overhead in loading all of

    the list data into an ADO.NET DataTable and then using its filtering capabilities to find the desired

    data. However, as stated earlier, if you cached the DataTable instead of loading the list data into

    it on each request, the results would probably have been significantly different. There still would

    be a very significant hit the first time the list data is loaded into the DataTable, however.

    Another point to note here is just how well the PortalSiteMapProviderclass performed. It was

    lightning fast in these tests, and significantly outperformed the other data access methods.

    Because the PortalSiteMapProviderand other tested methods performed substantially better

    than the For/Each, SPList with DataTable and Page Load in Browser methods, the latter methods

    were not included in any subsequent test results.

    Also, for the Page Load in Browser test, the page was configured to display 100 items per page.

  • 8/8/2019 Working with large lists

    17/22

    Comparing results with an indexedcolumn

    The goal of this test was to determine how much of a performance gain is realized when

    configuring the column used in the WHERE clause for the test query to be indexed.

    These results demonstrate that if you are using the SPList class as part of your data access

    strategy, you will benefit greatly from indexing the columns used in WHERE clauses. For other

    data access methods, indexing will likely give you only nominal benefit, if at all. Adding a column

    index actually reduced performance when using the PortalSiteMapProviderclass.

  • 8/8/2019 Working with large lists

    18/22

    Comparing an indexed column to an IDcolumn

    This test was conducted to compare the performance differences when using a WHERE clause in

    the query that relied on an items ID rather than the value of an indexed field.

    Whats interesting about these results is that they are essentially the inverse of the previous test.

    That is, when using ID as the filter field criteria, data access methods that do not use the SPList

    class perform much better. However, data access methods that rely on the SPList class still work

    much more quickly when they are using an indexed column rather than item IDs.

    Analyzing the resultsThe test results in this white paper validate the fact that with proper testing in your own

    environment, it is quite possible that you can use more than 2,000 items in a container without anadverse impact on performance. The best results will be obtained if you write your own user

    interface to work with the data in the list, and make some carefully considered choices about what

    data access method works best for your requirements. The data access method you choose may

    very well impact other aspects of your site or list implementation.

    For example, using data access methods that require the SPList class will greatly benefit from

    indexing columns used in a WHERE clause. However, the benefit of indexing these columns is

    marginal if the data is retrieved using the Search service, the Lists Web service or the

    PortalSiteMapProviderclass. Conversely, if you are not using the SPList class for data

  • 8/8/2019 Working with large lists

    19/22

    retrieval, data access will likely be much faster if you are able to retrieve data based on the ID of

    items, rather than the value of a specific column in a list.

    SearchSearch performed well across all of the scenarios. One drawback to using Search is that it cannot

    retrieve data until indexing has completed, so if immediate data retrieval is a requirement, Searchmay not be the best choice. You will probably also need to configure Search further to support

    your query requirements. For example, these tests required the ability to use a structured query

    language (SQL) statement that retrieved a very specific set of fields from a list, as well as use the

    ID and Expense Category field in the WHERE clause. For this solution to work, Managed

    Properties must be configured in Search to retrieve the custom properties from the list and to use

    criteria against them. Implementing Search as it was used in this testing requires Office

    SharePoint Server 2007.

    PortalSiteMapProviderThe PortalSiteMapProviderclass was one of the best performing data access methods in every

    scenario. However, there are a couple of limitations in using it. First, because of the way in which

    the data is cached, use of the PortalSiteMapProviderclass is going to be most useful if the datayou are retrieving is not significantly different over time. If you are trying to frequently retrieve

    different data sets, the PortalSiteMapProviderclass will incur the overhead of constantly reading

    from the database, inserting data into the cache and then returning it from the method call.

    Clearly, the advantage of the PortalSiteMapProviderclass is when it can read data directly from

    the cache.

    Also, the amount of memory the PortalSiteMapProviderclass has available to use may be

    somewhat constrained. It uses the site collection object cache to store data; by default, the object

    cache is only 100 megabytes (MB). You can increase the size of the site collection object cache

    on the Object cache settings page for the site collection. You can change the Max. Cache Size

    (MB) value on that page. However, remember that whatever amount of memory you assign to the

    object cache comes out of the same shared memory available to the application pool. If you are

    running the 32-bit version of Office SharePoint Server 2007, the most memory you can assign to

    a single application pool is 2 GB, and you immediately lose roughly 500 MB when the .NET

    Framework and base Office SharePoint Server 2007 DLLs and assemblies are loaded.

    Therefore, you need to balance the object cache size with how much memory you have available

    on your Web servers in addition to the processor architecture, other loaded programs used by

    Office SharePoint Server 2007, etc. The PortalSiteMapProviderclass is only available on Office

    SharePoint Server 2007.

    SPListUsing the SPList class gives you several options to retrieve data a For/Each enumeration, the

    Items collection, the GetDataTable method of an SPListItems collection, and using an SPQuery

    object to filter data. Some of those methods, specifically the GetListItems and GetDataTable

    from the results ofGetListItems, routinely performed well in most scenarios. However, there are

    some limitations. For example, the GetListItems method wont work across folders in a single list

    unless the ViewAttributes property of yourSPQuery query class includes Scope="Recursive".

    For that matter, it wont work across lists if you want to query data from multiple lists or subsites.

    It also requires that all code runs directly on the Office SharePoint Server 2007 computer. Other

    options, like the Lists Web service and the Search Web service (not the Search methodology that

    was used in these tests) can retrieve the data but run on remote servers.

  • 8/8/2019 Working with large lists

    20/22

    Data maintenance considerationsThere are a few other issues to consider when creating lists with more than 2,000 items per

    container. One is the cost of other common operations such as adding or deleting items from the

    list. We did some additional tests to measure the impact of those kinds of operations against our

    very large list. The results show that as the list gets quite large, those operations begin to slow

    down considerably.

  • 8/8/2019 Working with large lists

    21/22

    The results show that when the site is not under load, adding a single new item does not have a

    significant impact on performance. However, although indexing a column improves query

    performance, it also may negatively impact the performance of adding new records. Also,

    performance would obviously degrade when multiple items are being added and the site is under

    load.

    Performance for deleting items degrades significantly when a list becomes very large. Deleting a

    single item from a very large list takes much more time than deleting an item from a smaller list.In the test case, a single item was deleted from a site that was not under load. As the data shows,

    whether there was an indexed column or not, performance when changing list items degrades as

    the size of the list grows. Its more likely that a batch process would need to be built to delete

    items during off-peak periods. If that is not an option, the performance of delete functionality

    alone could conceivably force you to abandon plans to use very large lists in Office SharePoint

    Server 2007.

    Data lockingAnother important consideration when using large lists is the concept of the locks that Microsoft

    SQL Server places on data tables that contain list information. Virtually all data for all Office

    SharePoint Server 2007 lists is contained within a single table in SQL Server. This table contains

    data for all the lists in all the site collections whose data is stored in that content database. When

    you attempt to update data on a list item, whether that is adding, editing or deleting a list item,

    SQL Server will attempt to lock other items (rows to SQL Server) for that particular list.

    However, there is a limit to the number of individual rows that SQL Server will try to lock down. If

    you try to select approximately 5,000 items or more simultaneously for reading or update, SQL

    Server will typically lock the entire table for the duration of that change. In this event, all other

    reads and writes for all lists in all site collections are queued until the previous transaction is

    complete and the lock is released. If your query retrieves data across multiple folders within the

    list, the locking behavior occurs whether or not list items are recursively nested so that there are

    not more than 2,000 items in an individual container. To ensure that you dont encounter this

    locking behavior, make sure the number of items you retrieve in a single request is well below this

    threshold. For example, you can control the number of records returned by setting the RowLimit

    on the SPQuery class.

    Crawl times Another consideration with very large lists is crawl time and crawl time-outs. As a list gets

    larger, the chances of the indexer timing out when crawling the contents of that list increases.

    This is an issue that should be carefully monitored and tested in a lab environment before

    rolling out any large list in production. If the indexer is timing out when crawling large lists,

    you can increase the time-out value with the following steps:

    1. In Central Administration, on the Application Management tab, in the Search section,click Manage search service.

    2. On the Manage Search Service page, in the Farm-Level Search Settings section,

    click Farm-level search settings.3. In the Timeout Settings section, in the Connection time and Requestacknowledgement time boxes, enter the desired number of seconds.

    Related contentFor more detailed information about the factors involved in performance and capacity planning for

    Office SharePoint Server 2007 lists, see following resource:

  • 8/8/2019 Working with large lists

    22/22

    Plan for software boundaries (Office SharePoint Server) (http://go.microsoft.com/fwlink/?

    LinkID=95115&clcid=0x409). This article provides a starting point for planning the

    performance and capacity of your system, including performance and capacity testing results

    and guidelines for acceptable performance.

    http://go.microsoft.com/fwlink/?LinkID=95115&clcid=0x409http://go.microsoft.com/fwlink/?LinkID=95115&clcid=0x409