The scenario is there should only be one AccountID/CreateDate combo in our table1. But some process ran and has duplicated some rows and we want to get rid of them. Dedupe time!
Here is an example of deduping:
-- Create our testing table USE testdb GO CREATE TABLE table1 (ID INT IDENTITY(1,1) NOT NULL, AccountID INT NOT NULL, CreateDate DATETIME); INSERT INTO table1 VALUES (12345, '2019-01-01 00:00'), (12345, '2019-01-01 00:00'), (12345, '2019-01-01 01:21:00'), (12345,'2019-01-01 01:21:00'), (12345,'2019-01-01 01:21:20'), (12345,'2019-01-01 01:21:20'), (12345,'2019-01-01 01:21:20'), (23456, '2019-01-01 01:21:00'), (23456,'2019-01-01 01:21:10'), (23456,'2019-01-01 01:21:20'), (23456,'2019-01-01 01:29:20'), (23456,'2019-01-01 01:21:20'), (29956,'2019-01-01 01:21:10'), (29956,'2019-01-01 01:21:10'), (29956,'2019-01-01 01:29:20'), (29956,'2019-01-01 01:21:20'), (29956,'2019-01-02 01:21:20'), (29956,'2019-01-02 01:21:20'), (29956,'2019-01-03 01:29:20');
Now let’s see what lives in our table1:
SELECT * FROM table1; ID AccountID CreateDate ----------- ----------- ----------------------- 1 12345 2019-01-01 00:00:00.000 2 12345 2019-01-01 00:00:00.000 3 12345 2019-01-01 01:21:00.000 4 12345 2019-01-01 01:21:00.000 5 12345 2019-01-01 01:21:20.000 6 12345 2019-01-01 01:21:20.000 7 12345 2019-01-01 01:21:20.000 8 23456 2019-01-01 01:21:00.000 9 23456 2019-01-01 01:21:10.000 10 23456 2019-01-01 01:21:20.000 11 23456 2019-01-01 01:29:20.000 12 23456 2019-01-01 01:21:20.000 13 29956 2019-01-01 01:21:10.000 14 29956 2019-01-01 01:21:10.000 15 29956 2019-01-01 01:29:20.000 16 29956 2019-01-01 01:21:20.000 17 29956 2019-01-02 01:21:20.000 18 29956 2019-01-02 01:21:20.000 19 29956 2019-01-03 01:29:20.000 (19 rows affected)
Now let’s look at the Query to group the rows by AccountID and CreateDate. Remember our scenario is there should only be one AccountID/CreateDate combo. But some process duplicated some rows and we want to get rid of them. Notice we are not deleting, but only looking at the dupes:
WITH dedupe AS ( select accountid, CreateDate, ROW_NUMBER() OVER (partition by accountid, createdate order by createdate) as RN FROM table1 ) SELECT * FROM dedupe accountid CreateDate RN ----------- ----------------------- -------------------- 12345 2019-01-01 00:00:00.000 1 12345 2019-01-01 00:00:00.000 2 12345 2019-01-01 01:21:00.000 1 12345 2019-01-01 01:21:00.000 2 12345 2019-01-01 01:21:20.000 1 12345 2019-01-01 01:21:20.000 2 12345 2019-01-01 01:21:20.000 3 23456 2019-01-01 01:21:00.000 1 23456 2019-01-01 01:21:10.000 1 23456 2019-01-01 01:21:20.000 1 23456 2019-01-01 01:21:20.000 2 23456 2019-01-01 01:29:20.000 1 29956 2019-01-01 01:21:10.000 1 29956 2019-01-01 01:21:10.000 2 29956 2019-01-01 01:21:20.000 1 29956 2019-01-01 01:29:20.000 1 29956 2019-01-02 01:21:20.000 1 29956 2019-01-02 01:21:20.000 2 29956 2019-01-03 01:29:20.000 1 (19 rows affected)
From the above we know any of the rows with an RN > 1 are duplicates that we can get rid of. So NOW we will run the delete. Note that the delete against our CTE (“dedupe”) will affect the underlying table (“table1”):
WITH dedupe AS ( SELECT accountid, CreateDate, ROW_NUMBER() OVER (PARTITION BY accountid, createdate ORDER BY createdate) as RN FROM table1 ) DELETE FROM dedupe WHERE RN > 1
Now let’s see what we are left with.
SELECT * FROM table1; ID AccountID CreateDate ----------- ----------- ----------------------- 1 12345 2019-01-01 00:00:00.000 3 12345 2019-01-01 01:21:00.000 5 12345 2019-01-01 01:21:20.000 8 23456 2019-01-01 01:21:00.000 9 23456 2019-01-01 01:21:10.000 10 23456 2019-01-01 01:21:20.000 11 23456 2019-01-01 01:29:20.000 13 29956 2019-01-01 01:21:10.000 15 29956 2019-01-01 01:29:20.000 16 29956 2019-01-01 01:21:20.000 17 29956 2019-01-02 01:21:20.000 19 29956 2019-01-03 01:29:20.000 (12 rows affected)
Sure enough! We’ve eliminated our duplicate rows, and have unique AccountID/CreateDate combinations.
Thanks for reading!