Thursday, October 11, 2012

Design considerations in building a Large lists/Document Library in SharePoint 2010




Here are some of the check points that you would want to consider while designing large lists/libraries in your application. You can go through this nice article (http://download.microsoft.com/download/7/F/0/7F069D0B-B6BD-4692-868B-E8555BB72445/DesigningLargeListsMaximizingListPerformance.docx) if you would like to know more details and analysis about some of these items.
1.      Create indexes. Improves search and reads. But, add and update slows down. Ensure an Index is created for every view filter.
2.       Reduce the number of lookups as much as possible. De-normalize where ever it is possible. No view should have more than configurable 8 lookups.
3.       Reduce the lookups which bring in more than 2000 records.
4.       Create views which has filtering on the indexed column. This way you are keeping your data well below the threshold limit on views of 5000
5.       Reduce the number of the columns in a large list. SharePoint stores in multiple rows when multiple columns (see table 1 below) of the same data type is used. Also called as Row Wrapping.
6.       If you have multiple content types defined in a large lists then separate them into multiple lists. This way you are reducing the list size and potential performance problems.
7.       List thresholds also impact document libraries. So ensure that you put the documents in several folders. The threshold applies only on the containers. Folder is a container that is valid even in a list as well.
8.       Use SharePoint Search for searching. You can scope the results to a particular List/Library. However, this has a limitation that the results will be limited to the last crawl only.
a.       You can use Search Query Object model. There are tools to aid in building the query. (http://sharepointsearchserv.codeplex.com/ and  http://mosssearchcoder.codeplex.com/)
9.       You can use Developer dashboards to view load times and queries that are affecting the Views.
10.   Use Content Iterator Developer API to work against large lists in an iterative manner whiling working on smaller sets in each iteration.
11.   Try and use Caching (Output, Object and BLOB cache) at various levels to prevent hitting the SQL Server.
12.   Try setting Throttling limits as a last resort. You can set them in General Settings on a web application.
13.   You may also want to set the Allow Object Model override, if you wish to capture more records programmatically.
a.       Minimize the number of unique permissions you give (i.e. list item permissions degrades the performances on large lists). Maximum of 50K unique permissions is allowed (Recommended is to keep it below 5K). Even, permission breaking at Folder is considered as one unique permission. Each time permissions inheritance is broken a new scope ID is created. Each time you query on a view, you join against the scopes table and when a query is performed, each unique access control list (ACL) must be parsed and processed.
14.   For designing Large Document libraries, you may want to consider the Content DB Size as well depending on the number of documents, versions and average size of each document.
a.       You may also want to consider using Remote BLOB storage for these documents.
b.      You may also want to consider third party solutions for archiving (ex: Symantec Enterprise Vault) while maintaining the search ability to these documents.
c.       Think about organizing the files into folders and provide tagging feature so that metadata based navigation & Search can be used to reach to the document in a consistent manner. This also helps in configuring Retention and Permissions for administration.
d.      Think about using Content Organizer feature that can automatically move files based on the Content Types and Metadata without any intervention by users. Create new folders once the limit is reached.
e.      Open with Explorer will not work if the number of documents in a folder is more than the limit.
15.   Minimize the number of Multi valued columns. This decreases the performance.
16.   Queries on the Managed metadata columns are faster than the Choice type fields. Managed metadata columns support multi value as well. 
17.    For all you may know, storing the data in a custom database would make more sense in certain scenarios. You could consider using BCS to perform CRUD operations on this data.
       Table 1
Column Type
Number of Columns per Table Row
Single line of text
64
Choice and Multiple lines of text
32
Date and Time
8
Yes/No
16
Number and Currency
12
Calculated
8
Int, Single Value Lookup, People and Group, Managed Metadata
16
Unique Identifier
1

Before you implement a large list consider the business case and requirements. Requirements such as service level agreement (SLA), time to backup and restore, size of content, amount of content (number of items), and access times are all important to consider. Depending on the size and demand of the application, you must make important choices at multiple levels, including hardware, content storage, and SharePoint information architecture. A large application with millions of items and hundreds of concurrent users might require standalone hardware for the specific project, although a document repository with tens of concurrent users and tens of thousands of documents may work fine with existing shared hardware and a single document library in an existing site.
The end results of planning should be a list of column types (names, data type, and usage), indexes, folder structure, usage of pages and links for navigation, planned structure of permissions, estimated number of items and total data size. Details should also include information about the types of queries that will be performed and how data from the list will be accessed, created, and updated. 

No comments:

Post a Comment