SQL Server table partitioning allows to physically split the data of one table on the disk. Partitioning means data division that allows to directly access one unique partition instead of all the data from the table.
First, why all large tables should have partitions? When accessing hundreds of million lines, it becomes very tricky to get a large bunch of data in seconds. Partitions allows to improve query performance.
Advantages of SQL Server table partitioning
- The main advantage is obviously to improve query performance. It avoids full scans by targeting only the need partitions.
- Table partitions allow to transfer only a subset of data based on the functional key instead of the full set of data. Let’s consider monthly data. The query transfers only the latest month stored in a single partition. It is much quicker than selecting the full scope including the full year and the previous years.
- Partitions on tables allows a much flexible maintenance like reorganize or rebuilt the indexes.
For instance, let’s consider a Microsoft Business Intelligence project with an Operational Data Store, a Datawarehouse, DataMart’s and an Analysis Tabular Model. Daily, the latest data available could be only the current month data. So, it’s more interesting to load only the current month. It also allows much quick delete and inserts to refresh a given set of data.
Please note that all versions available before and up to SQL Server 2012 supports only up to 1000 partitions per table.
Main components of a table partition
On the other hand, SQL Server partitioning uses different mandatory objects: partition functions, partition schemes and partitioning columns.
- A partition function defines how to store and map data to a partition, it’s a SQL Server function. That is to say that based on one column the complete row will flow to partition 1 or 2, etc.
- A partition scheme maps the data partitions to a set of physical files.
- A partitioning column segregates the data in the different partitions. It can be a year, a month or any other data type. The choice of the column is based on the actual usage of the data and the most used one to query the data.
For example, let’s consider a partition function based on year number. The first one could be the historical years, i.e., before 2020. Then the data before 2021 and 2022 year, then only 2022 and then the data associated after 2022. Hence the partition example improves query performance.
Moreover, considering the example above, and assuming the file groups exists on the system. The partition scheme distributes the data like this:
- Sales_1_fg contains SalesYear < 2020
- Sales_2_fg contains SalesYear >= 2020 AND SalesYear < 2021
- Sales_3_fg contains SalesYear >= 2021 AND SalesYear < 2022
- Sales_4_fg contains SalesYear >= 2022
Check out the article on Microsoft about the partition functions.
A practical example of a partitioning function and a partition scheme on MSSQL
A complete article on how to create a partitioned table by year, with a partition function, file groups, and finally the partitioning scheme. All the queries are available in this article:
To conclude, the first article of the partition series defines what is SQL Server partitioning. Check out this tutorial about index management and how to create a partitioned view with SQL Server.
Check this frequently asked questions about SQL Server table partitions
Partitioning technically splits a table or an index in different parts based on column values. Partitions are physically independent to improve performance and allow easier maintenance, data loads and transfers. A partition uses a partition function, a partition scheme, and a partitioned column.
To partition a table, chose the main column used in the queries, loads or reports. Create a partition function, for example the past months, the previous month, the current month, and the next months. Create the file groups associated to be used by the partition scheme.
An index structures the data inside one logical table. Then, the database administrators rebuilt and reorganize the indexes. A partitioned table allows the data to be physically split and accessed independently. An index itself can be partitioned.
The best practice is to maintain automatically the partition function. So, it creates the new partitions as the data evolves. For example, every new month should be taken in a partition using the database current date.
Be the first to comment