At work we have a reasonably large table with a NVARCHAR(MAX) column. This column stores HTML and other user generated content. This table is on a SQL Server 2014 instance, so cannot take advantage of new compression technologies in SS 2016.
Looking at the estimated benefits via query:
DECLARE @MyTableVar TABLE ( object_name NVARCHAR(50), schema_name NVARCHAR(50), index_id INT, partition_number INT, size_w_CURRENT_compression_MB INT, size_w_REQUESTED_compression_MB INT, sample_size_w_current_compression_MB INT, sample_size_w_requested_compression_MB INT ) INSERT INTO @MyTableVar EXEC sp_estimate_data_compression_savings @schema_name = 'dbo' , @object_name = 'description_archive' , @index_id = NULL , @partition_number = NULL , @data_compression = 'ROW' select object_name , schema_name , index_id , partition_number , (size_w_CURRENT_compression_MB/1024) as size_w_CURRENT_compression_MB , (size_w_REQUESTED_compression_MB/1024) as size_w_REQUESTED_compression_MB from @MyTableVar
Well … bummer! Looks like no savings with ROW compression, and running the same query above with PAGE compression did not look much better. Why? It seems like a NVARCHAR(MAX) column full of HTML would be a great candidate for compression.
The answer lies in the allocation units in use for this table. Let’s see how the allocation works for our archive table:
SELECT OBJECT_NAME(sp.object_id) AS [ObjectName] , si.name AS IndexName , sps.in_row_data_page_count as In_Row , sps.row_overflow_used_page_count AS Row_Over_Flow ,sps.lob_reserved_page_count AS LOB_Data FROM sys.dm_db_partition_stats sps JOIN sys.partitions sp ON (sps.partition_id=sp.partition_id) JOIN sys.indexes si ON (sp.index_id=si.index_id AND sp.object_id = si.object_id) WHERE OBJECTPROPERTY(sp.object_id,'IsUserTable') =1 AND OBJECT_NAME(sp.object_id) = 'description_archive' order by sps.in_row_data_page_count desc
The results show that the majority of this table data is stored in LOB type pages instead of in-row, because the “description” field is composed of values mostly > 8,000 bytes.
If we had data in our NVARCHAR(MAX) column whose size was <= 8000k we would benefit from PAGE level compression, as the data would be stored IN ROW (provided we didn’t specify something different