Author Archives: justino

How I Learned to Stop Worrying and Add The Nonclustered Index

In my post about transactional replication we configured the dbo.Badges table for replication. Recall that this is a table that lives in the StackOverflow database.   Let’s pretend that over time the production table we replicated has grown to billions of rows, and our reporting relies on hitting the dbo.Badges subscriber table in the SO_small database.  In our scenario, as is often the case IRL, our production table is clustered on the “Id” field, and we are not allowed to add any additional indexes in production to support our query. But our query results are returning slower than molasses running up hill in winter because we are querying on the Date column, and there is no index present for this.

What we ARE allowed to do is add indexes to our replicated tables in the data warehouse.  What are the impacts of adding a nonclustered index on the “Date” column to one of our subscriber tables? The plan is to add a nonclustered index on SO_small.dbo.Badges as a test. We are interested in:

  • What is the impact to transaction log growth?
  • Do we have to reinitialize our subscription?

My initial concern was how the transaction log on production (StackOverflow) might grow based on having to store transactions while replication is paused. Further research showed this concern was unfounded. As it turns out, the Log Reader Agent continues to monitor the transaction log of our StackOverflow database, and loads any relevant transactions into the distribution database. Recall we called the distribution database “distDB” in our initial setup from the last post.  In the distribution database, the commands are queued until the Distribution Agent can copy them to the subscribers.  In our case, the transactions can be up to 72 hours old before they would expire and cause the need to re-snapshot our data.

A demo will be instructive. We will use the StackOverflow and SO_small databases from the last blog post on Transactional Replication.  Let’s check our database file sizes before we do this:


USE SO_small;
SELECT [file_id],
   [type],
   substring([physical_name],1,1) as DriveLetter,
   [name] LogicalFileName,
   [physical_name] PhysicalFileName,
   CAST([size] as DECIMAL(38,0))/128. as SizeMB, 
   CAST(FILEPROPERTY([name],'SpaceUsed') AS DECIMAL(38,0))/128. AS SpaceUsedMB, 
   (CAST([size] as DECIMAL(38,0))/128) - (CAST(FILEPROPERTY([name],'SpaceUsed') AS DECIMAL(38,0))/128.) FreeSpaceMB,
   (CAST([max_size] as DECIMAL(38,0))/128) as MaxSizeMB,
   [is_percent_growth] as [Is%Grow],
   [growth],
   GETDATE()
FROM [sys].[database_files];

Filespace Before Adding Our Nonclustered Index

StackOverflow Database:

image

SO_small database:

image

To make this all a bit more realistic, I’ve started running a load against the StackOverflow database. The tool I use for this is SQLQueryStress. You can visit https://github.com/ErikEJ/SqlQueryStress for documentation and updates. The .sql script I have modified for generating queries/inserts/updates/deletes is available at BrentOzar.com.

Let’s go ahead and pause replication while we run the index build. Bring up the Replication Monitor, click the “W2K12-SQL” node, and go the the Subscription Watch window.  Right click the subscription:

image

Choose “Stop Synchronizing” and the status will change to “Not running; performance critical”.

Now let’s create our index. We are choosing to do it OFFLINE, in hopes that the index build will run faster. In my system at work, this made a huge difference in testing:


USE SO_small
GO
  CREATE NONCLUSTERED INDEX idx_NC_Badges_Date
  ON dbo.Badges ([Date])



WITH (PAD_INDEX = ON, 
      STATISTICS_NORECOMPUTE = OFF, 
      SORT_IN_TEMPDB = ON, 
      DROP_EXISTING = OFF, 
      ONLINE = OFF, 
      ALLOW_ROW_LOCKS = ON, 
      ALLOW_PAGE_LOCKS = ON, 
      FILLFACTOR = 90, 
      MAXDOP = 4)
  GO


Now let’s take a look at replication, particularly our Undistributed commands shown in the Subscription Watch window.

image

You can see that a number of commands stacked up in the distributor database while we had replication synchronization paused for the index build.  No worries though! We simply bring up the Replication Monitor, click the “W2K12-SQL” node, right-click and select “Start Synchronizing”.

image

This will enable the Distributor to synchronize the commands waiting in the distDB to the SO_small database.  Basically the Log Reader Agent continued to monitor the transaction log of our StackOverflow database even while replication was paused. When we start synchronizing the Distribution Agent can copy those stored commands to the subscriber. When the synchronization catches up, we’ll see the record counts in range between the dbo.Badges tables match once again. We have to add the appropriate filter to our query, as we created our dbo.Badges article with a filter (WHERE Date >= ‘20160101’)


USE SO_small;
GO



select count(*)
from dbo.Badges
WHERE Date >= '20160101';



USE StackOverflow;
GO



select count(*)
from dbo.Badges
WHERE Date >= '20160101';

image

A quick look at the difference in IO/time stats and plans for this query between StackOverflow and SO_small show the positive impact of our new index.  Notice the lower reads and overall execution time between StackOverflow and the SO_small queries. The query plans show that the StackOverflow query had to do a scan on the cluster (more expensive!) versus a much more selective seek on our new nonclustered index for the SO_small query:

StackOverflow

image

image

SO_small

image

image

As it turns out, we win! Our nonclustered index speeds up our query. We did NOT blow up transaction logs. And we did not have to re-snapshot our publication table and wait for that. Great job! I hope this helps you out. See you next time!

The Path to Enlightenment – SQLSkills style!

Too often in my career I have heard “SQL Server doesn’t scale as well as DBMS x” or “We don’t need a DBA. You just install SQL Server and it runs itself”. Both of these are misconceptions that are reinforced by a lack of people who truly understand the deep magic that is SQL Server internals. My goal is to be a better evangelist for SQL Server, shedding light on the true power of our favorite DBMS. I want to better grok both the “Hows” and the “Whys” of SQL Server performance optimization, apply this knowledge in my daily work, and share this knowledge via my blog and presentations to SQL Server users groups in California (and maybe beyond!)

To that end, I am entering the contest to win a seat in the “IEPTO1: Performance Tuning and Optimization – Part 1” class offered by Paul Randal and Kimberly Tripp at SQLskills.com. The folks at SQLSkills.com are giving away two seats… one each for their Level 1 and Level 2 performance classes. I see these classes as the gold standard in SQL Server training, and plan to attend someday by hook or by crook. The class would be a huge kickstart for query and performance tuning efforts in my current job.

In my work in Business Intelligence I am particularly interested in improving analytic queries against large (100’s of millions to billions of row) tables. I currently deal/struggle with queries against some pretty giant tables on a daily basis. I have found some strategies for choosing good indexes, tweaking storage/partitioning, and getting better stats in place. However, the depth and breadth of Paul and Kim’s class would unlock a treasure trove of knowledge and best practices around indexing and tuning strategies. I am excited by the prospect of learning to avoid performance killing problems in the first place, by implementing proper design and implementation of basic database structures. Training with two of the top minds in our SQL Server community would give me a huge boost in confidence to push forward on the speaking front, and provide acres of soil in which to grow ideas for future blog posts.

If you can attend these classes, you will be a giant leap ahead in the path that is SQL Server enlightenment. This is a great opportunity and anyone who is a SQL Server user/enthusiast/fan should head on over to SQLSkills.com and read how to win!

Setup a Simple Transactional Replication lab

At work we lost our super star DBA some months back, and though he is sorely missed, this has created opportunities for me to grow in my SQL Server knowledge. One of the key components of our Business Intelligence environment (my world!) is our replication between PROD and the Data Warehouse. Because this replication is critical to our reporting, not to mention my livelihood, I have been digging in to how transactional replication works.  So the first step on this journey is to get a playground setup on my personal laptop and let the learning begin. I’m glad you are here to follow along.

We are going to replicate the dbo.Badges table from the StackOverflow database to small database called SO_small, using transactional replication in SQL Server 2016.

image

Setup the Distributor and Publisher

For the lab, we will set up all replication components on the same instance. In particular we’ll run a local distributor, which is different from most production environments, as I found out. However the constraints of my home lab dictate this setup.

All our configuration can be done with the wizards.  First, select “Replication” folder, and choose “Configure Distribution”:

image

Now you’ll get the “distributor” screen of the wizard. As mentioned above, choose to have all replication components on the same instance for our walk through:

image

We’ll choose to have our SQL Server Agent start automatically.

image

We now need to create the Snapshot Folder for initial seeding of table:

Create folder on DB server

We need a folder to hold the snapshot that will be created.  I called mine “sql_replic_share”. Super creative!

image

For purposes of the demo, just grant folder permissions to the “Authenticated Users” group to get going. NOT RECOMMENDED IN PRODUCTION!

image

Setup folder sharing on the sql_replic_share. Again for demo, grant “Everyone” read control. NOT RECOMMENDED FOR PRODUCTION!

image

Now populate “Snapshot Folder” with the correct path to the folder that will hold our initial snapshot and continue:

image

Configure the Distribution DB:

A distribution database will be created as part of this process and will be visible in the System databases folder. Be sure to choose the location that has the storage you are wanting to use:

image

As noted, this will create a System Database called “distDB”

image

Configure publishers. The local instance is Auto filled since we are setting up as a Distributor locally:

image

Let SQL Server do the distribution configuration:

image

Verify configuration. If it all looks good, hit Finish:

image

The wizard should finish successfully:

image

 

Setup Publication

We are wanting to replicate the dbo.Badges table FROM StackOverflow to the SO_small database on the same SQL Server instance.

Right click the Local Publications folder under the Replication folder, and choose New Publication:

image

We want to publish dbo.Badges from the StackOverflow database, so that is our Publication database for this dialog:

image

We are setting up Transactional Replication, so we will use that type for our Publication:

image

Our goal is to publish StackOverflow.dbo.Badges, so that is the Article we care about to publish. We are taking all the columns, though we do not have to:

image

For kicks, we’ll add a filter to our Article, so that we only get the Badges since 1/1/2016. In our case this will speed our queries and conserve space:

image

image

Our filter is now populated on the dialog:

image

Snapshot:

We are now at the step where the initial snapshot of our data will be created. We’ll create ours right away. This is the initial “seeding” of our replicated table in our subscriber database:

image

For our demo, we are going to run the Snapshot Agent under the SQL Server Agent service account, though usually you want to have a separate account to run each agent:

image

We’ll create our publication straight away, though you could also have SQL Server create a script for later use:

image

After hitting next, you’ll be presented with a chance to name your Publication. I chose something descriptive, including the fact that my publication contains a filter:

image

Hit finish, and if all goes well you’ll be presented with a dialog that reflects success:

image

Setup the Subscription

Now we’ll setup our subscription. In most real-world environments this subscription would live in a different instance from the Publisher. But this is a demo… so let’s keep it easy:

Expand “Replication”, and then right-click Local Publication and choose “New Subscriptions…”

image

Since we have but one publication, it is the only one to show in our window. Hit Next to be taken to the Distribution Agent Location dialog:

image

We will be creating a “push subscription” which is the default. Hit next.

image

On the “Subscribers” page, we choose our server and the database that will be our subscription database (SO_small):

Now we come the Distribution Agent Security Window. The distribution agent will be responsible for moving data from the distribution to the subscriber databases. So the accounts specified must be able to connect to each

image

We’ll click the … ellipsis next to the Connection to Subscriber box, and then we will specify the account that will run the Distributor Agent process. Since this is non-production, we’ll go ahead with “Run Under the SQL Server Agent server account”

image

Takes us back to the screen

image

We will choose the schedule that the agent will run. We choose “Run Continuously”

image

Now let’s initialize our subscription with a snapshot. Keep in mind this can take many hours/days depending on the size of the object being replicated. Our table is small, so it takes no time at all.  But a table at work took over a day, because it was rather wide and contained several billion rows. So beware of this.

image

We’ll let the wizard do its work, though you could also generate a script if so desired:

image

image

And this is what success looks like:

image

Or so it seems… however

Gotchas!

I ran into a problem after initially setting up my lab. You might find this useful. After waiting awhile and looking at our SO_small database, no dbo.Badges table showed up in our Subscription database (SO_small)

image

WTH? Fortunately we have a tool for monitoring replication that is accessible through SSMS. Not surprisingly it is called Replication Monitor (who would have guessed?). Go to the Replication folder in SSMS, right click and choose “Launch Replication Monitor”

image

We are presented with an application designed to help us figure out what is happening in replication at a given moment:

image

As we see, there is no shortage of red x’s, an almost universal sign of trouble. Let’s click on that “Agents” tab with the big red X:

Upon clicking the first error in the list on that tab, we see our problem has to do with access to our share that we created to hold our Snapshot:

image

First thing I’ll try is granting “Full Control” to the Everyone group on the share (again non for production!)

After making the permissions change, right click the line that shows the Error and list Snapshot Agent in the job column.  Choose “Start Agent”

image

When all is well, our replication monitor view will be gleefully free of the red x’s:

image

And sure enough, our table is now replicated in the SO_small database!

image

Add another article to an existing publication

The beauty of this is that it does not require another full snapshot, as long as you don’t change any other article’s properties in these steps. A mini-snapshot will be taken just for the object you are adding to replication in the current publication.

Right click the local publication we created above (BadgesFilteredPublication) and choose Properties:

image

On the Publication properties page, go to the Articles tab, uncheck “Show only checked articles in the list” and check the box for “Posts (dbo)” under the Objects to publish part of the dialog. Uncheck the “Body” column, as we don’t wish to replicate this NVARCHAR(MAX) column in our demo. Also, note we are leaving “Badges (dbo)” checked, as it is the existing article that is part of this publication. Hit “OK”.

image

At this point the snapshot agent will run, and generate the needed snapshot into the snapshot shared folder:

image

When the snapshot is applied, your replicated table will be available in the subscriber database:

image

At this point you have all the components of replication in place in your environment, and can see data flowing between the StackOverflow.dbo.Badges table and your SO_small.dbo.Badges table.  Next post we’ll put this to use! Hope to see you then.

 

Can Table Partitioning Improve Query Performance?

At work we have tables containing billions of rows, and are hundreds of GB in size. The business intelligence queries we run against them in the reporting database are often focused around dates and not accountids / productids as they are in PROD.  For these large tables, it is sometimes useful to split the records into multiple storage locations. Especially when most of these reporting tables live on relatively slow spinning disk (spinny drives in LUNs on SANs), there are benefits to be gained from getting the most frequently used / most recent data onto SSD. Enter table partitioning!

The idea behind partitioning is that the rows in our tables live in storage units called partitions. By default every table has one partition. Since these partitions can be stored on separate Filegroups living on separate Files (.ndf), we gain I/O benefits when targeting the partitions with our most frequently used rows to SSD. In my testing the benefits exist for both querying and loading these large tables. Previously an enterprise only feature, table partitioning is now available in SQL Server 2016 SP1 standard edition, so hopefully you will be able to take advantage of this too!

image

To protect the names of the innocent, I decided to come up with a contrived example using the StackOverflow database (see useful links at the bottom for where to download) while I learned and documented the process. For the example in this post, we want to query a rather large table (6M+ rows) from our local StackOverflow database to get summary stats about the reputations of users who joined in 2015. Because the kind of analyses we frequently perform involve grouping our Users in cohorts by the dates they created their accounts, I decided to partition and cluster on CreationDate. The plan is to make these types of queries more performant and to help with data loading on the back-end ETL processing.

Example Query – BEFORE Table Partitioning

Here is our query of interest. We are seeking some descriptive stats about the cohort of people who joined in 2015.

/* -- Clean the buffers */
DBCC DROPCLEANBUFFERS 
GO 
SET STATISTICS IO, TIME ON;
GO
SELECT Top 1 AVG(Reputation) OVER () as AVG_Reputation
, MIN(Reputation) OVER () as MIN_Reputation
, MAX(Reputation) OVER () as MAX_Reputation
FROM StackOverflow.dbo.Users
WHERE CreationDate >= '2015-01-01' and CreationDate < '2016-01-01'

IO Stats: (parsed by http://statisticsparser.com/) – Before Table Partitioning

The output of the time and IO stats is not beautiful by default.  Fortunately Richie Rump at http://statisticsparser.com/ created a tool to make the presentation beautiful (or at least much more pleasant to look at!)

clip_image001[1]

Looks like a good long time to get back our results.

Partitioning the Table

Let’s verify the current partitioning our our Users table:

USE StackOverflow;
GO
SELECT o.name objectname,i.name indexname, partition_id, partition_number, [rows]
FROM sys.partitions p
INNER JOIN sys.objects o ON o.object_id=p.object_id
INNER JOIN sys.indexes i ON i.object_id=p.object_id and p.index_id=i.index_id
WHERE o.name = 'Users'

image

As expected, our Users table currently has but 1 partition.

Create a new filegroup and secondary data file (.ndf) that lives on flash storage:

This new filegroup is where our new partitions will live.

Create new filegroup

USE [master]
GO
ALTER DATABASE [StackOverflow] ADD FILEGROUP [SECUNDUS]
GO

Create new datafile

USE [master]
GO
ALTER DATABASE [StackOverflow] 
ADD FILE ( NAME = N'StackOverflow_Secundus', 
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\StackOverflow_Secondus.ndf' , 
SIZE = 1024MB , FILEGROWTH = 500MB ) 
TO FILEGROUP [SECUNDUS]
GO

Step 1: Partition key choice

With our filegroups and new datafile created, let’s do this thing. First, which column to partition ON?  We need a single column (called the “partition key”) on which to partition our table. Let’s create a partition for soft archiving the “Users” table records older than 2015, using CreationDate as our partition key. For our example, this is a useful choice since the majority of our queries are interested in the data from the last couple years. Partitioning by CreationDate will allow for “partition elimination”, since our scenario queries against the Users table are usually filtered on CreationDate. Partition elimination will save time and I/O costs.

NOTE: Our database has no non-clustered indexes for this demo. The Users table is clustered on AccountId. In real life we would do some indexing on CreationDate for performance reasons, but for simplicity I have not done this so we focus strictly on the impact of table partitioning.

Step 2: Create Partition Function (the “HOW”):

We now specify how we want to use the partition key to implement the partition. We do this by defining the boundary of our partitions. We’ll have only 4 partitions in our example:

USE StackOverflow;
GO
CREATE PARTITION FUNCTION pf_UsersCreationDate (datetime) 
AS RANGE RIGHT FOR VALUES ('20150101', '20160101', '20170101'); 

image

Step 3: Create Partition Scheme (the “WHERE”):

USE StackOverflow;
GO
CREATE PARTITION SCHEME ps_UsersCreationDate 
AS PARTITION pf_UsersCreationDate TO ([PRIMARY],[SECUNDUS],[SECUNDUS],[SECUNDUS]) 
GO

So what did we actually do? We told SQL Server HOW to divide up our data and WHERE to store it. We are setting all of our partitions Let’s see where we’re at:

SELECT ps.name,pf.name,boundary_id,value
FROM sys.partition_schemes ps
INNER JOIN sys.partition_functions pf ON pf.function_id=ps.function_id
INNER JOIN sys.partition_range_values prf ON pf.function_id=prf.function_id

clip_image001

But is OUR table, the existing Users table, now partitioned?

SELECT o.name objectname,i.name indexname, partition_id, partition_number, [rows]
FROM sys.partitions p
INNER JOIN sys.objects o ON o.object_id=p.object_id
INNER JOIN sys.indexes i ON i.object_id=p.object_id and p.index_id=i.index_id
WHERE o.name = 'Users'

clip_image002

Not so much. Still shows only 1 partition. Since it is an existing table, we have to either recreate the Users table and reload the data, or we can drop and rebuild the clustered index. The Users table is currently clustered on “Id” column.

Let’s check out our current indexing on dbo.Users before going forward. Brent Ozar’s sp_BlitzIndex to the rescue!


EXEC sp_BlitzIndex
     @DatabaseName = 'StackOverflow',
     @SchemaName = 'dbo',
     @TableName = 'Users'

This shows we currently have a clustered primary key on the Id field.

image

No time like the present… let’s get our table in partitioned!


-- Drop the existing primary key constraint.
USE StackOverflow;
GO
ALTER TABLE dbo.Users DROP CONSTRAINT PK_Users_Id;
GO


-- Add new non-clustered primary key constraint on Id
ALTER TABLE dbo.Users ADD CONSTRAINT PK_USERS_ID PRIMARY KEY NONCLUSTERED (Id)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


-- Create new clustered index using our partition key as the cluster key
CREATE CLUSTERED INDEX idx_Users_CreationDate ON dbo.Users (CreationDate)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) 
ON ps_UsersCreationDate(CreationDate)
GO

And now we can see that our table is using the partitions we created:

SELECT o.name objectname,i.name indexname, partition_id, partition_number, [rows]
FROM sys.partitions p
INNER JOIN sys.objects o ON o.object_id=p.object_id
INNER JOIN sys.indexes i ON i.object_id=p.object_id and p.index_id=i.index_id
WHERE o.name = 'Users'

clip_image003

Let’s take another look at our indexes on dbo.Users using the same sp_BlitzIndex statement as above:

image

Our clustered index is now on CreationDate, with a UNIQUIFIER column added behind the scenes because our clustered index is not unique.  Notice our Primary Key on “Id” also has an additional “hidden” column of CreationDate plus the UNIQUIFIER.

Query AFTER table partitioning

Let’s run our original query against the Users table again now that partitioning is in place:

-- Clear the buffers
DBCC DROPCLEANBUFFERS 
GO 
-- turn on our statistics
SET STATISTICS IO, TIME ON;
GO
-- rerun our query of interest
SELECT Top 1 AVG(Reputation) OVER () as AVG_Reputation
, MIN(Reputation) OVER () as MIN_Reputation
, MAX(Reputation) OVER () as MAX_Reputation
FROM StackOverflow.dbo.Users
WHERE CreationDate >= '2015-01-01' and CreationDate < '2016-01-01'

Results:

clip_image001[1]

IO Stats: (parsed by http://statisticsparser.com/) – After Table Partitioning

clip_image001[3]

As one can see in the table below, our elapsed time was drastically reduced after partitioning. We effectively moving our data of interest onto the SSD storage, which helps IO efficiency:

image

The number of all read types was reduced. But was this performance increase due more to the table partitioning or to a choice of a clustering key more in line with our query patterns? Let’s take a look.

Clustering on CreationDate without partitioning

IO Stats Before changing clustering

As a reminder, here is the result of our IO and time stats when we ran the query against our Users table before partitioning and changing any indexes:

clip_image001[6]

After clustering the Users table on CreationDate

clip_image001[8]

In short, if you have a large table and a mix of slow and fast storage, table partitioning can be one tool for improving your query performance by taking advantage of that speedy SSD for your most commonly queried data. As noted in this demo, partitioning had less effect on query performance than just choosing a clustering key most appropriate for our particular query patterns. Depending on your environment you may not always be able to choose the clustering key you would most like to have.

Fortunately query speed isn’t the only benefit of table partitioning. Stay tuned for my next blog post about using table partitioning to help with ETL performance.

Useful links for further learning

I found the following resources to be invaluable as I set about diving into table partitioning:

Partitioning Basics:

https://www.cathrinewilhelmsen.net/2015/04/12/table-partitioning-in-sql-server/

Partitioning existing tables:

https://www.mssqltips.com/sqlservertip/2888/how-to-partition-an-existing-sql-server-table/

StackOverflow DB Download:

https://www.brentozar.com/archive/2015/10/how-to-download-the-stack-overflow-database-via-bittorrent/

Statistics Parser by Richie Rump:

http://statisticsparser.com/

How much space will table compression save me?

At work we have a reasonably large table with a NVARCHAR(MAX) column. This column stores HTML and other user generated content. This table is on a SQL Server 2014 instance, so cannot take advantage of new compression technologies in SS 2016.

Looking at the estimated benefits via query:

DECLARE @MyTableVar TABLE
(
  object_name NVARCHAR(50),
  schema_name NVARCHAR(50),
  index_id INT,
  partition_number INT,
  size_w_CURRENT_compression_MB INT,
  size_w_REQUESTED_compression_MB INT,
  sample_size_w_current_compression_MB INT,
  sample_size_w_requested_compression_MB INT
  )
  INSERT INTO @MyTableVar
  EXEC sp_estimate_data_compression_savings
  @schema_name =  'dbo'
  , @object_name = 'description_archive'
  , @index_id = NULL
  , @partition_number = NULL
  , @data_compression = 'ROW'
select object_name
  , schema_name
  , index_id
  , partition_number
  , (size_w_CURRENT_compression_MB/1024) as size_w_CURRENT_compression_MB
  , (size_w_REQUESTED_compression_MB/1024) as size_w_REQUESTED_compression_MB
  from @MyTableVar

clip_image001

Well … bummer!  Looks like no savings with ROW compression, and running the same query above with PAGE compression did not look much better. Why? It seems like a NVARCHAR(MAX) column full of HTML would be a great candidate for compression.

The answer lies in the allocation units in use for this table.  Let’s see how the allocation works for our archive table:

SELECT OBJECT_NAME(sp.object_id) AS [ObjectName]
, si.name AS IndexName
, sps.in_row_data_page_count as In_Row
, sps.row_overflow_used_page_count AS Row_Over_Flow
,sps.lob_reserved_page_count AS LOB_Data
FROM sys.dm_db_partition_stats sps
JOIN sys.partitions sp ON (sps.partition_id=sp.partition_id)
JOIN sys.indexes si ON (sp.index_id=si.index_id AND sp.object_id = si.object_id)
WHERE OBJECTPROPERTY(sp.object_id,'IsUserTable') =1
AND OBJECT_NAME(sp.object_id) = 'description_archive'
order by sps.in_row_data_page_count desc

The results show that the majority of this table data is stored in LOB type pages instead of in-row, because the “description” field is composed of values mostly > 8,000 bytes.image

If we had data in our NVARCHAR(MAX) column whose size was <= 8000k we would benefit from PAGE level compression, as the data would be stored IN ROW (provided we didn’t specify something different

Nice date labels in SSRS

You have a create date in Datetime format (ugly!). You want to create a nice row label of out that for your SSRS report. Well here ya go!:

Set the text box to an expression:

=MonthName(Month(Fields!CreateDate.Value))&" "& Year(Fields!CreateDate.Value)

This will give you a nice date label like “December 2015”.

For an even snazzier label, I like:
=Left(MonthName(Month(Fields!CreateDate.Value)),3) & "-" & Right(Year(Fields!CreateDate.Value),2)

This gives you a label like this:

Dec-15

Virtual logfile (VLF) hell…

At one point in my career I inherited a database that had suffered from neglect (to be kind).  The logfiles had be set to autogrow 1MB at a time, and this had happened ALOT! Interestingly this rather small database was taking a surprisingly long time to restore, Knowing that a high number (>1000) of VLFs can cause insert/update and restore performance penalties, I decided to check the VLF counts across the instance, since I had not done so to that point:


DECLARE @SQLtext varchar(500),
 @dbname varchar(500),
 @count int

SET NOCOUNT ON

DECLARE crsr CURSOR FAST_FORWARD READ_ONLY
FOR
SELECT name
FROM sys.databases

CREATE TABLE ##tblinfo
(
 dbname varchar(100),
 row_count int)

OPEN crsr

FETCH NEXT FROM crsr INTO @dbname

WHILE (@@fetch_status &lt;&gt; -1)
BEGIN

CREATE TABLE #log_info
(
 RecoveryUnitId tinyint,
 fileid tinyint,
 file_size bigint,
 start_offset bigint,
 FSeqNo int,
[status] tinyint,
 parity tinyint,
 create_lsn numeric(25,0)
)

SET @SQLtext = 'DBCC loginfo (' + '''' + @dbname + ''') '

INSERT INTO #log_info
EXEC (@SQLText)

SET @count = @@rowcount

DROP TABLE #log_info

INSERT INTO ##tblinfo
VALUES(@dbname, @count)

FETCH NEXT FROM crsr INTO @dbname

END

CLOSE crsr
DEALLOCATE crsr

SELECT dbname,
 row_count
FROM ##tblinfo
ORDER BY dbname

DROP TABLE ##tblinfo

After running this, I was able to identify the problem database and resolve the issue by shrinking the current TLOG file and setting the starting size and autogrow to more appropriate sizes.