SQL: need to remove duplicate rows in a query containing multiple joins

Please note that I am a complete SQL noob and in the learning process. Based on Google search results (including search here), I tried using SELECT DISTINCT and GROUP BY, but none of them work, probably due to all my connections (if anyone knows why they won’t work for sure, that was would be useful for learning).

I need data from different tables, and below is the only way I know this (I just know the basics). The following query works fine, but shows duplicates. I need to know how to remove them. The only hint I have now is perhaps a sub-SELECT query, but based on the research, I'm not sure how to implement them. Any help at all would be great, thanks!

USE SQL_Contest go SELECT CLT.Description AS ClockType, CLK.SerialNumber AS JobClockSerial, SIT.SiteNumber AS JobID, SIT.[Name] AS JobsiteName, SIT.Status AS SiteActivityStatus, DHA.IssuedDate AS DHAIssuedDate, -- Date the clock was assigned to THAT jobsite CLK.CreatedDate AS CLKCreatedDate, -- Date clock first was assigned to ANY jobsite SES.ClockVoltage FROM tb_Clock CLK INNER JOIN tb_ClockType CLT ON CLK.TypeID = CLT.ClockTypeID INNER JOIN tb_DeviceHolderActivity DHA ON CLK.ClockGUID = DHA.DeviceGUID INNER JOIN tb_Site SIT ON SIT.SiteGUID = DHA.HolderGUID LEFT JOIN tb_Session SES ON SES.ClockSerialNumber = CLK.SerialNumber WHERE DHA.ReturnedDate IS NULL ORDER BY SIT.[Name] ASC 

EDIT: In the near future I will consider these answers, thank you very much. I send additional duplicate information to Rob's request:

Everything is displayed fine until I add:

 LEFT JOIN tb_Session SES ON SES.ClockSerialNumber = CLK.SerialNumber 

What I need. This is when a duplicate appears:

 JobClock 2,500248E4,08-107,Brentwood Job,1,2007-05-04 13:36:54.000,2007-05-04 13:47:55.407,3049 JobClock 2,500248E4,08-107,Brentwood Job,1,2007-05-04 13:36:54.000,2007-05-04 13:47:55.407,3049 

I want this information to be displayed only once. Essentially, this query is to identify all active jobs that have an assigned clock, and this task has only one beat assigned to it, and this is only one workspace, but it appears twice.

EDIT 2: Based on the help you provided you, I was able to determine that they are NOT really duplicates, and each session is independent, this is the only time you had two sessions. So, now I’ll try to figure out how to extract information from the last session.

+4
source share
3 answers

If everything "works fine" until you add:

 LEFT JOIN tb_Session SES ON SES.ClockSerialNumber = CLK.SerialNumber 

Then for each CLK there should be more than one entry in tb_Session .SerialNumber.

Run the following query:

 SELECT * FROM tb_Session SES WHERE ClockSerialNumber = '08-107' 

Two records must be returned. You need to decide how to handle this (i.e. which record do you want to use?) If both lines from tb_Session do not contain the same data, in which case they should?

You can always change your request to:

 SELECT CLT.Description AS ClockType, CLK.SerialNumber AS JobClockSerial, SIT.SiteNumber AS JobID, SIT.[Name] AS JobsiteName, SIT.Status AS SiteActivityStatus, DHA.IssuedDate AS DHAIssuedDate, -- Date the clock was assigned to THAT jobsite CLK.CreatedDate AS CLKCreatedDate, -- Date clock first was assigned to ANY jobsite SES.ClockVoltage FROM tb_Clock CLK INNER JOIN tb_ClockType CLT ON CLK.TypeID = CLT.ClockTypeID INNER JOIN tb_DeviceHolderActivity DHA ON CLK.ClockGUID = DHA.DeviceGUID INNER JOIN tb_Site SIT ON SIT.SiteGUID = DHA.HolderGUID LEFT JOIN ( SELECT DISTINCT ClockSerialNumber, ClockVoltage FROM tb_Session ) SES ON SES.ClockSerialNumber = CLK.SerialNumber WHERE DHA.ReturnedDate IS NULL ORDER BY SIT.[Name] ASC 

Since this should ensure that SES contains only one entry for each unique combination of ClockSerialNumber and ClockVoltage

+3
source

Take this example:

Ingredient

 IngredientId IngredientName ============ ========= 1 Apple 2 Orange 3 Pear 4 Tomato 

Recipe

 RecipeId RecipeName ======== ========== 1 Apple Turnover 2 Apple Pie 3 Poached Pears 

Recipe_Ingredient

 RecipeId IngredientId Quantity ======== ============ ======== 1 1 0.25 1 1 1.00 2 1 2.00 3 3 1.00 

Note. . Why Apple Apple has two lots of apples as an ingredient, neither here nor there, it's simple.

The following query will return two rows for the Apple Turnover recipe, one row for the Apple Pie recipe, and one row for the Poached Pears recipe, because the Recipe_Ingredient table has two entries for IngredientId 1. This is what happens with the introduction.

 SELECT I.IngredientName, R.RecipeName FROM Ingredient I JOIN Recipe_Ingredient RI ON I.IngredientId = RI.IngredientId JOIN Recipe R ON RI.recipeId = R.RecipeId 

You can get this to return only one row by changing it to:

 SELECT I.IngredientName, R.RecipeName FROM Ingredient I JOIN Recipe_Ingredient RI ON I.IngredientId = RI.IngredientId JOIN Recipe R ON RI.recipeId = R.RecipeId GROUP BY I.IngredientName, R.RecipeName 

Without specific details about your data, it’s hard to apply this to your specific scenario, but the walkthrough will help you understand where the β€œduplicates” come from as someone unfamiliar with SQL

+1
source

Connections are not your problem. From your comments, I conclude that what you call "duplicates" is not an actual duplicate. If all column values ​​for 2 "duplicates" are returned from the matched query, then SELECT DISTINCT or GROUP BY will definitely eliminate them. So you have to find a solution by looking at the definitions of your columns.

My best guess is that you get duplicates on the same date that are not actually duplicated, because the time component of the date does not match. To fix this problem, you can truncate date fields to date only with this technique:

  DATEADD(DAY, DATEDIFF(DAY, 0, DHA.IssuedDate), 0) AS DHAIssuedDate, DATEADD(DAY, DATEDIFF(DAY, 0, CLK.CreatedDate), 0) AS CLKCreatedDate, 

If this does not work, you can take a look at JobClockSerial : JobClockSerial this column belong to the query results?

0
source

All Articles