Pages

Friday, May 27, 2016

How to Find Duplicate Records in All the Tables in SQL Server Database - SQL Server / TSQL Tutorial

You are working on this big project called Data Analysis. you need to find all the duplicate records in all the tables in a database. The database contains hundreds of tables. It is taking forever to write query for each table and then run it and find the duplicate records. You want something that can run in one shot and provide you all the list of tables, columns, duplicate values and duplicate count.

Solution:

We are going to use Cursor so we can loop through tables in our database and use dynamic sql to build our query to find duplicate record. We will run this query for each table and then insert the records in #Results Temp Table. 

At the end you will have one nice and neat table with Database Name, Schema Name,Table Name, Column List, Duplicate Value,Total Table Row count and Duplicate Count.

Few things to remember before you go ahead and run this script on Production. As this script is going to generate queries on fly and run them one after one. If your database is big and has tables with a lot or records, it can take long time to run. Find the best when there is very minimum activity on server or try to schedule on weekend when have very limited processes running on server. 


USE yourdbname 

go 

DECLARE @SchemaName VARCHAR(100) 
DECLARE @TableName VARCHAR(100) 
DECLARE @DatabaseName VARCHAR(100) 

--Create Temp Table to Save Results 
IF Object_id('tempdb..#Results') IS NOT NULL 
  DROP TABLE #results 

CREATE TABLE #results 
  ( 
     databasename       VARCHAR(100), 
     schemaname         VARCHAR(100), 
     tablename          VARCHAR(100), 
     columnlist         VARCHAR(max), 
     duplicatevalue     VARCHAR(max), 
     totaltablerowcount INT, 
     duplicaterowcnt    INT 
  ) 

DECLARE cur CURSOR FOR 
  SELECT table_catalog, 
         table_schema, 
         table_name 
  FROM   information_schema.tables 
  WHERE  table_type = 'BASE TABLE' 

OPEN cur 

FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName 

WHILE @@FETCH_STATUS = 0 
  BEGIN 
      --Get List of the Columns from Table without Identity Column 
      DECLARE @ColumnList NVARCHAR(max)=NULL 

      SELECT @ColumnList = COALESCE(@ColumnList + '],[', '') + c.NAME 
      FROM   sys.columns c 
             INNER JOIN sys.tables t 
                     ON c.object_id = t.object_id 
      WHERE  Object_name(c.object_id) = @TableName 
             AND Schema_name(schema_id) = @SchemaName 
             AND is_identity = 0 

      SET @ColumnList='[' + @ColumnList + ']' 

      --Print @ColumnList 
      DECLARE @ColumnListConcat VARCHAR(max)=NULL 

      SET @ColumnListConcat=Replace(Replace(Replace(Replace(@ColumnList, '[', 
                                                    'ISNULL(Cast(['), ']', 
                                            '] AS VARCHAR(MAX)),''NULL'')'), 
                                            ',ISNULL', '+ISNULL'), '+', 
                            '+'',''+') 

      --Create Dynamic Query for Finding duplicate Records 
      DECLARE @DuplicateSQL NVARCHAR(max)=NULL 

      SET @DuplicateSQL= ';With CTE as   (select  ''' 
                         + @DatabaseName + ''' AS DBName,' + '''' 
                         + @SchemaName + ''' AS SchemaName,' + '''' 
                         + @TableName + ''' AS TableName,' + '''' 
                         + @ColumnList + ''' AS ColumnList,' 
                         + @ColumnListConcat 
                         + ' AS ColumnConcat,    (Select count(*) from [' + @SchemaName 
                         + '].[' + @TableName 
                         + '] With (Nolock))             AS TotalTableRowCount    ,RN = row_number()             over(PARTITION BY ' 
                         + @ColumnList + '  order by ' + @ColumnList 
                         + ')             from [' + @SchemaName + '].[' 
                         + @TableName + ']  ) Select * From CTE WHERE RN>1' 

      PRINT @DuplicateSQL 

      INSERT INTO #results 
      EXEC(@DuplicateSQL) 

      FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName 
  END 

CLOSE cur 

DEALLOCATE cur 

SELECT * 
FROM   #results 
--drop table #Results 


I ran this script on my database, My Database is small so it took few seconds to complete. I got below results for my tables.It found all the duplicate records in all the table if there were any in SQL Server Database.

No comments:

Post a Comment