Removing duplicate rows (based on values from multiple columns) from an SQL table

Question

Removing duplicate rows (based on values from multiple columns) from an SQL table

I have an SQL table:

AR_Customer_ShipTo

+--------------+------------+-------------------+------------+ | ARDivisionNo | CustomerNo | CustomerName | ShipToCode | +--------------+------------+-------------------+------------+ | 00 | 1234567 | Test Customer | 1 | | 00 | 1234567 | Test Customer | 2 | | 00 | 1234567 | Test Customer | 3 | | 00 | ARACODE | ARACODE Customer | 1 | | 00 | ARACODE | ARACODE Customer | 2 | | 01 | CBE1EX | Normal Customer | 1 | | 02 | ZOCDOC | Normal Customer-2 | 1 | +--------------+------------+-------------------+------------+

(ARDivisionNo, CustomerNo,ShipToCode) form the primary key for this table.

If you notice that the first 3 lines belong to one client (Test Customer), which has different ShipToCodes: 1, 2 and 3. Similarly in the case of the second client (ARACODE client). Each normal client and regular client-2 has only 1 entry with one ShipToCode .

Now I would like to receive a query for the results in this table, where I will have only 1 record for each client. Thus, for any client with more than 1 record, I would like to save the record with the highest value for ShipToCode .

I tried different things:

(1) I can easily get a list of customers with one entry in the table.

(2) With the following query, I can get a list of all customers who have more than one record in the table.

[Request-1]

 SELECT ARDivisionNo, CustomerNo FROM AR_Customer_ShipTo GROUP BY ARDivisionNo, CustomerNo HAVING COUNT(*) > 1;

(3) Now, to select the correct ShipToCode for each record returned by the above query, I cannot figure out how to iterate over all the records returned by the above query.

If I do something like:

[Request-2]

 SELECT TOP 1 ARDivisionNo, CustomerNo, CustomerName, ShipToCode FROM AR_Customer_ShipTo WHERE ARDivisionNo = '00' and CustomerNo = '1234567' ORDER BY ShipToCode DESC

Then I can get the corresponding entry for (00-1234567-Test Customer). Therefore, if I can use all the results of query-1 in the above query (query-2), then I can get the desired individual records for clients with more than one record. This can be combined with the results from point (1) to achieve the desired end result.

Again, this may be easier than the approach I am following. Please let me know how I can do this.

[Note. I have to do this using only SQL queries. I cannot use stored procedures since I am going to accomplish this thing, finally using "Scribe Insight", which allows me to write queries.]

+7

sql join sql-server tsql duplicate-removal

Vikram May 14, '15 at 17:47

source share

4 answers

You did not specify a version of SQL Server, but ROW_NUMBER is probably supported:

 select * from ( select ... ,row_number() over (partition by ARDivisionNo, CustomerNo order by ShipToCode desc) as rn from tab ) as dt where rn = 1

+3

dnoeth May 14, '15 at 17:56

source share

ROW_NUMBER() great for this:

 ;WITH cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY ARDivisionNo,CustomerNo ORDER BY ShipToCode DESC) AS RN FROM AR_Customer_ShipTo ) SELECT * FROM cte WHERE RN = 1

You mentioned deleting duplicates, if you want DELETE , you can simply:

 ;WITH cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY ARDivisionNo,CustomerNo ORDER BY ShipToCode DESC) AS RN FROM AR_Customer_ShipTo ) DELETE cte WHERE RN > 1

The ROW_NUMBER() function assigns a number to each row. PARTITION BY is optional, but is used to start numbering for each value in a given field or group of fields, that is: if you are PARTITION BY Some_Date , then for each unique date value, numbering starts at 1. ORDER BY The course is used to determine how it should keep an account, and is required in the function ROW_NUMBER() .

+3

Hart CO May 14, '15 at 18:04

source share

row_number function:

 SELECT * FROM( SELECT ARDivisionNo, CustomerNo, CustomerName, ShipToCode, row_number() over(partition by CustomerNo order by ShipToCode desc) rn FROM AR_Customer_ShipTo) t WHERE rn = 1

+2

Giorgi nakeuri May 14, '15 at 17:57

source share

HaveNoDisplayName · Accepted Answer · 2015-05-14T18:00:22+0000

SQL FIDDLE Example

1) Use CTE to get the maximum cost of a ship code based on ARDivisionNo, CustomerNo for each client

 WITH cte AS ( SELECT*, row_number() OVER(PARTITION BY ARDivisionNo, CustomerNo ORDER BY ShipToCode desc) AS [rn] FROM t ) Select * from cte WHERE [rn] = 1

2) To delete a record, use “Delete query” instead of “Select” and change “Where” to “rn> 1. SQL FIDDLE example

 WITH cte AS ( SELECT*, row_number() OVER(PARTITION BY ARDivisionNo, CustomerNo ORDER BY ShipToCode desc) AS [rn] FROM t ) Delete from cte WHERE [rn] > 1; select * from t;

Removing duplicate rows (based on values ​​from multiple columns) from an SQL table

More articles:

Removing duplicate rows (based on values from multiple columns) from an SQL table