In case I needed a reminder of how cool columnstore indexes are for analytic queries, I made this example to jog my memory. I used the StackOverflow 180 GB sized database you can download from Brent Ozar’s site. I have it running in VMWare Workstation pro, but these days you can run your SQL Server learning lab any number of ways.
As a test, I created a “small” version of the dbo.Posts table, removing any of the string fields, since they are not relevant to the analytics I am doing in my made up world of magical data goodness. I then copied all the data from dbo.Posts to dbo.Posts_small table. I created the separate slimmed down version of Posts to simulate having a “reporting” version of the Production Posts table, as would happen IRL. Of course, as of SQL Server 2017, you could recreate the Posts table as a clustered columnstore and have the benefit of better analytics query performance on your OLTP table. But that idea makes me cringe, and want to hide under my covers. Reporting on PROD brings out the grouchy DBA in me!
Create columnstore version of Posts table
-- Create a thinner version of the Posts table for reporting, as a clustered columnstore
USE [StackOverflow]
GO
CREATE TABLE [dbo].[Posts_columnstore_small](
[Id] [int] NULL,
[AcceptedAnswerId] [int] NULL,
[AnswerCount] [int] NULL,
[ClosedDate] [datetime] NULL,
[CommentCount] [int] NULL,
[CommunityOwnedDate] [datetime] NULL,
[CreationDate] [datetime] NOT NULL,
[FavoriteCount] [int] NULL,
[LastActivityDate] [datetime] NOT NULL,
[LastEditDate] [datetime] NULL,
[LastEditorUserId] [int] NULL,
[OwnerUserId] [int] NULL,
[ParentId] [int] NULL,
[PostTypeId] [int] NOT NULL,
[Score] [int] NOT NULL,
[ViewCount] [int] NOT NULL,
INDEX ix_CS_cluster_postssmall CLUSTERED COLUMNSTORE)
ON [PRIMARY]
GO
-- Copy data from the relevant columns into my "small" columnstore table.
insert into [dbo].[Posts_columnstore_small]
select [Id]
,[AcceptedAnswerId]
,[AnswerCount]
,[ClosedDate]
,[CommentCount]
,[CommunityOwnedDate]
,[CreationDate]
,[FavoriteCount]
,[LastActivityDate]
,[LastEditDate]
,[LastEditorUserId]
,[OwnerUserId]
,[ParentId]
,[PostTypeId]
,[Score]
,[ViewCount]
from dbo.Posts
Now let’s create a slimmed down version of the dbo.Posts table, but this time keep it a rowstore table. I just want to be sure that the speed benefits we see in a bit are due to the columnstore index and not just to the absence of the text columns, so we will create two clones of dbo.Posts: one columnstore and one rowstore.
Create rowstore version of Posts table
--------------------------------------------------------------------------
-- Create rowstore version of the dbo.Posts table without the text columns
---------------------------------------------------------------------------
USE [StackOverflow]
GO
CREATE TABLE [dbo].[Posts_rowstore_small](
[Id] [int] IDENTITY(1,1) NOT NULL,
[AcceptedAnswerId] [int] NOT NULL,
[AnswerCount] [int] NULL,
[ClosedDate] [datetime] NULL,
[CommentCount] [int] NULL,
[CommunityOwnedDate] [datetime] NULL,
[CreationDate] [datetime] NOT NULL,
[FavoriteCount] [int] NULL,
[LastActivityDate] [datetime] NOT NULL,
[LastEditDate] [datetime] NULL,
[LastEditorUserId] [int] NULL,
[OwnerUserId] [int] NULL,
[ParentId] [int] NULL,
[PostTypeId] [int] NOT NULL,
[Score] [int] NOT NULL,
[ViewCount] [int] NOT NULL,
CONSTRAINT [PK_PostsRSsmall__Id] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
-- now insert data from the selected columns from existing dbo.Posts table.
SET IDENTITY_INSERT [dbo].[Posts_rowstore_small] ON;
GO
insert into [dbo].[Posts_rowstore_small] (
[Id]
,[AcceptedAnswerId]
,[AnswerCount]
,[ClosedDate]
,[CommentCount]
,[CommunityOwnedDate]
,[CreationDate]
,[FavoriteCount]
,[LastActivityDate]
,[LastEditDate]
,[LastEditorUserId]
,[OwnerUserId]
,[ParentId]
,[PostTypeId]
,[Score]
,[ViewCount]
)
select [Id]
,[AcceptedAnswerId]
,[AnswerCount]
,[ClosedDate]
,[CommentCount]
,[CommunityOwnedDate]
,[CreationDate]
,[FavoriteCount]
,[LastActivityDate]
,[LastEditDate]
,[LastEditorUserId]
,[OwnerUserId]
,[ParentId]
,[PostTypeId]
,[Score]
,[ViewCount]
from dbo.Posts
-- Verify row counts
select count(*) from [dbo].[Posts_columnstore_small]
40700647
select count(*) from [dbo].[Posts_rowstore_small]
40700647
Now it’s time to run our queries. I made a super contrived analytic-type query, just doing some aggregations across the tables by OwnerUserId and the year that the question was created. Put the queries in two different windows, and be sure to set time and io statistics on. The runtimes aren’t that different, at least on my machine, but the IO is SIGNIFICANTLY less on the query run against the columnstore version of the Posts table:
SSMS Tab 1: query against the rowstore version of Posts table
set statistics io, time on
select OwnerUserId, datepart(YYYY, CreationDate) as QuestionDateYear
, count(*) as QuestionCount
, count(*) over () as totalQuestionCount
from [dbo].[Posts_rowstore_small] p
Group by p.OwnerUserId, datepart(YYYY, CreationDate)
SSMS Tab 2: query against the columnstore version of Posts table
set statistics io, time on
select [OwnerUserId], datepart(YYYY, CreationDate) as QuestionDateYear
, count(*) as QuestionCount
, count(*) over () as totalQuestionCount
from [dbo].[Posts_columnstore_small] p
GROUP BY p.OwnerUserId, datepart(YYYY, CreationDate)
Now if we look at the Messages tab of our results in SSMS, we can see the query stats (time and IO as we specified). To make our lives easier, copy the contents of the “messages” tab in SSMS for each query individually, and paste each over to the wonder Statistics Parser tool created by Richie Rump (blog | twitter) .
The results for each query are shown below. First the rowstore version. Notice the huge number of Logical Reads!
Stats Results – rowstore query

Stats Results – columnstore query

The query runtime is about half as long for the columnstore version as it was for the rowstore. And the I/Os are much lower with the columnstore variety. Next time we’ll talk about why!