Pages

Tuesday, May 5, 2015

Finding Duplicate Records Using GROUP BY in SQL Server

There are various times when we need to find duplicate records in SQL Server. It is possible to find duplicates using DISTINCT, ROW NUMBER as well as the GROUP BY approach.
Duplicate records can create problems sometimes when displaying reports or performing a Multiple Insert update. Finding duplicate records in a database needs further investigation. In some cases, duplicate records are positive, but it all depends on the data and the database design as well.
For example, if a customer has ordered the same product twice on the same date with the the same shipping and billing address, then this may result in a duplicate record.
Let us create a table Customer with First Name, Last Name, and Mobile Number fields.
CREATE TABLE CUSTOMER
(
FirstName VARCHAR(50),
LastName  VARCHAR(50),
MobileNo  VARCHAR(15)
);

INSERT INTO CUSTOMER VALUES ('Niraj','Yadav',989898);

INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);
INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);

INSERT INTO CUSTOMER VALUES ('Atul','Kokam',42424242);
INSERT INTO CUSTOMER VALUES ('Atul','Kokam',42424242);

INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);

INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);
INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);

SELECT * FROM CUSTOMER;
Finding Duplicates Records Using Group by in SQL Server (2)
Using the DISTINCT approach, we can quickly get unique rows in a table.
SELECT DISTINCT FirstName, LastName, MobileNo FROM CUSTOMER;
Finding Duplicates Records Using Group by in SQL Server (3)
However, this does not show how many times a row has been duplicated. Using the GROUP BY approach, we can find this.
Finding Duplicates Using GROUP BY
Adding grouping and a counting field to our display of FirstName, LastName and MobileNo combination shows how many times each customer’s name appears.
SELECT  FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM  CUSTOMER
GROUP BY FirstName, LastName, MobileNo;
Finding Duplicates Records Using Group by in SQL Server (4)
GROUP BY will show just one record for each combination of FirstName, LastName and MobileNo.
The count CNT shows how many times the row has been duplicated.
CNT = 1 indicates that row appears only once.
Let us filter out using the Having clause to exclude rows that appear only once.
SELECT  FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM  CUSTOMER
GROUP BY FirstName, LastName, MobileNo
HAVING  COUNT(1) > 1;
Finding Duplicates Records Using Group by in SQL Server (1)

No comments:

Post a Comment