PARTITION BY Name, identifier for comparing and detecting problems

Explaination

Imagine there are 3 companies. We join the Name tables because not every employee has provided their PersonalNo . StringId are only specialists, so it can not be used to connect. The same employee can work in several companies.


PROBLEM

The problem is that there may be different employees with the same name (with the same names and surnames, in the example only the first name is indicated).


WHAT DO YOU NEED?

Return 1 when there is any data problem, and 0 if it is correct.


RULES FOR DETECTING A PROBLEM

  • If you have several identical names (2 or more) and all have equal PersonalNo , and not all have StringId (like Peter), should return 1 (this is wrong)
  • If you have several identical names (2 or more) and have NULL (see John), but they all have the same StringId It should return 0 (This is correct, this means that one of the companies is not subject to PersonalNo )
  • If there are several equal names (2 or more) and all PersonalNo are equal, and all StringId are equal (see in Lisa). It should return 0 (correct)
  • If you have several identical names (2 or more) and have several different PersonalNo and all StringId , they should be similar: we see that there are two different people here: Jennifer with 4805250141 PersonalNo and Jennifer with 4920225088 PersonalNo , Jennifer with NULL PersonalNo has the same StringId as Jennifer with 4920225088 PersonalNo , so it should return 0 (correct) and Jennifer with 4805250141 PersonalNo should not be selected because they have StringId and have only one string with the same PersonalNo .
  • If there is only 1 row, and not provided by StringId It should not appear in select at all.

SAMPLE DATA

 Company Name PersonalNo StringId Comp1 Peter 3850342515 85426 ------------------------------------------------------------------- Comp2 Peter 3850342515 '' -- If have the same PersonalNo and there is no StringId - 1 (wrong) Comp1 John NULL 12345 ------------------------------------------------------------------ Comp2 John 3952525252 12345 -- If have the same StringId and 1 PersonalNo is NULL - 0 (correct) Comp1 Lisa 4951212581 52124 ---------------------------------------------------------------- Comp3 Lisa 4951212581 52124 -- If PersonalNo are equal and StringId are equal - 0 (correct) Comp1 Jennifer 4805250141 '' ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Comp1 Jennifer 4920225088 55443 -- If have 2 different PersonalNo and NULL PersonalNo, but where PersonalNo is NULL Comp3 Jennifer NULL 55443 -- Have the same StringId with other row where is provided PersonalNo it should be 0 (correct), with different PersonalNo where is no StringId shouldn't appear at all. Comp1 Ralph 3961212256 '' -- Shouldn't appear in select list, because only 1 row with this PersonalNo and there is no StringID 

DESIRED EXIT

 Peter 1 John 0 Lisa 0 Jennifer 0 

QUERY

 LEFT JOIN (SELECT Name, ( SELECT CASE WHEN MIN(PersonalNo) <> MAX(d.PersonalNo) and MIN(CASE WHEN StringId IS NULL THEN '0' ELSE StringId END) <> MAX(CASE WHEN d.StringId IS NULL THEN '0' ELSE d.StringId END) -- this is wrong and MIN(PersonalNo) <> '' and MIN(PersonalNo) IS NOT NULL and MAX(rn) > 1 THEN 1 ELSE 0 END AS CheckPersonalNo FROM ( SELECT Name, PersonalNo, [StringId], ROW_NUMBER() OVER (PARTITION BY Name, PersonalNo ORDER BY Name) rn FROM TableEmp e1 WHERE Condition = 1 and e1.Name = d.Name ) sub2 GROUP BY Name ) CheckPersonalNo FROM [TableEmp] d WHERE Condition = 1 GROUP BY Name ) f ON f.Name = x.Name 

The problem with the query is that I can only group Name , cannot add the PersonalNo GROUP BY to GROUP BY , so I need to use aggregates in the select list. But for now, it only compares MIN and MAX values. If there are more than two lines with the same name, it does not work properly.

I need to do something like comparing the values ​​of PARTITION BY Fullname, PersonalNo . It now compares values ​​with the same Name (independent of PersonalNo ).

Any ideas? If you have any questions - ask me, I will try to explain.


UPDATE 1

If there are 2 entries with different PersonalNo , but their StringId are equal, this should be 1 (incorrect).

 Company Name PersonalNo StringId Comp1 Anna 4805250141 88552 -- different PersonalNo and the same StringId for both should go as 1 (wrong) Comp1 Anna 4920225088 88552 

Now it returns as:

 Anna 0 Anna 0 

It should be:

 Anna 1 

UPDATE 2

After the UNION update, the Identifier column returns StringId: 55443 (for the data below), but in this case, when 1 record has a PersonalNo , the other is blank , but both have the same (equal) StringId Correct (should be 0)

 Comp1 Jennifer 4920225088 55443 Comp3 Jennifer '' 55443 
+7
sql sql-server tsql sql-server-2008 group-by
source share
1 answer

I hope I understand your requirements.

There may be other ways to do this, but personally I would probably use temporary tables for temporary work if I did.

 --select data into a temp table that can be modified select * into #cleaned from table --apply personal numbers based on other records with matching string id --you could take note of the records you are doing this to for data clean up update c set c.personalNo = s.personalNo from #cleaned as c inner join table as s on c.name = s.name and c.stringID = s.stringID and c.personalNo is null and s.personalNo is not null --find all records with non matching string ids select name ,PersonalNo ,count(*) as numIDs into #issues from( select name ,PersonalNo ,stringID from #cleaned group by name ,PersonalNo ,stringID ) as i group by name ,PersonalNo having count(*) > 1 --select data for viewing. select distinct s.name ,case when i.name is not null then 1 else 0 end as issue from #cleaned as s left outer join #issues as i on s.name = i.name and s.personalNo = i.personalNo order by issue desc 

SQLFiddle: http://sqlfiddle.com/#!3/f4aab/7

Sorry if there are errors here, but I'm sure you get the idea, its not rocket science, just a different approach

EDIT : you notice that you are interested in strings without a row id. Just if this is the only line, then this is not a problem. I changed select first (in # cleaned) to take all the rows.

EDIT : NO Temp Tables now you know what it does, the same thing here without any temporary tables, but a WARNING . This is an update of the source tables to assign the missing personal number.

 update c set c.personalNo = s.personalNo from table1 as c inner join table1 as s on c.name = s.name and c.stringID = s.stringID and c.personalNo is null and s.personalNo is not null select distinct s.name ,case when i.name is not null then 1 else 0 end as issue from table1 as s left outer join ( select name ,PersonalNo ,count(*) as numIDs from( select name ,PersonalNo ,stringID from table1 group by name ,PersonalNo ,stringID ) as i group by name ,PersonalNo having count(*) > 1 ) as i on s.name = i.name and s.personalNo = i.personalNo order by issue desc 

SQLFiddle: http://sqlfiddle.com/#!3/f4aab/8

PARITIONING I don’t see how I will use partitioning here, because what you want to do is known only if there is more than one row, I use partitioning from a more complex table or if I am going to rank decision results when updating data based on more complex rules .. but in any case, insertion is prohibited in the section: D

 Select name ,personalNo ,case when numstrings > 1 then 1 else 0 end as issue from (select name ,personalNo ,row_number() over (partition by name ,personalNo order by name ,personalNo ,stringID ) as numstrings from #cleaned group by name ,personalNo ,stringid) as d order by issue desc 

NOTE: this uses the # cleared table, as mentioned above, since I believe that the essence of what makes it difficult is sometimes the missing personal information.

No temporary tables, no updates

Working with the above, it is obvious that you can do without any temporary tables or update anything, it is just a matter of readability / maintainability, and also whether it is really even faster. this can be made more stable for handling string identifiers with multiple personal names:

 select distinct s.name ,case when i.name is not null then 1 else 0 end as issue from table1 as s left outer join ( select name ,PersonalNo ,count(*) as numIDs from( select a.name ,coalesce(a.PersonalNo,b.PersonalNo) as PersonalNo ,a.stringID from table1 as a left outer join table1 as b on a.name = b.name and a.stringid=b.stringid and a.personalNo != b.personalNo and b.personalNo Is Not Null group by a.name ,a.PersonalNo ,a.stringID ,b.PersonalNo ) as i group by name ,PersonalNo having count(*) > 1 ) as i on s.name = i.name and s.personalNo = i.personalNo order by issue desc 

SQLFiddle: http://sqlfiddle.com/#!3/f4aab/9

EDIT: searching for inconsistent personal numbers too - this uses one temporary table, but you can change it, as was done in the last example. NOTE. There is a slight deviation from the original structure that you set because, as it would be more, I would complete this task, but there is more than enough code for you so that you can flip whatever you want.

 --select data into a temp table that can be modified select * into #cleaned from table1 --apply personal numbers based on other records with matching string id --you could take note of the records you are doing this to for data clean up update c set c.personalNo = s.personalNo from #cleaned as c inner join table1 as s on c.name = s.name and c.stringID = s.stringID and c.personalNo is null and s.personalNo is not null Select IssueType ,Name ,Identifier from ( --find all records with non matching PersonalNos select name ,cast('StringID: ' + stringID as nvarchar(400)) as Identifier ,cast('Inconsistent PersonalNo' as nvarchar(400)) as issueType from( select name ,PersonalNo ,stringID from #cleaned group by name ,PersonalNo ,stringID ) as i group by name ,StringId having count(*) > 1 UNION --find all records with non matching string ids select name ,'PersonalNo: ' + PersonalNo ,cast('Inconsistent String ID' as nvarchar(400)) as issueType from( select name ,PersonalNo ,stringID from #cleaned group by name ,PersonalNo ,stringID ) as i group by name ,PersonalNo having count(*) > 1 ) as a 

SQLFiddle: http://sqlfiddle.com/#!3/e9da2/18

UPDATE: it is also required to accept an empty string personalNo This is another new requirement .. accept an empty string the same as NULL in personalNo

 --select data into a temp table that can be modified select * into #cleaned from table1 --apply personal numbers based on other records with matching string id --you could take note of the records you are doing this to for data clean up update c set c.personalNo = s.personalNo from #cleaned as c inner join table1 as s on c.name = s.name and c.stringID = s.stringID and (c.personalNo IS NULL OR c.personalNo ='') and s.personalNo is not null and s.personalNo != '' Select IssueType ,Name ,Identifier from ( --find all records with non matching PersonalNos select name ,cast('StringID: ' + stringID as nvarchar(400)) as Identifier ,cast('Inconsistent PersonalNo' as nvarchar(400)) as issueType from( select name ,PersonalNo ,stringID from #cleaned group by name ,PersonalNo ,stringID ) as i group by name ,StringId having count(*) > 1 UNION --find all records with non matching string ids select name ,'PersonalNo: ' + PersonalNo ,cast('Inconsistent String ID' as nvarchar(400)) as issueType from( select name ,PersonalNo ,stringID from #cleaned group by name ,PersonalNo ,stringID ) as i group by name ,PersonalNo having count(*) > 1 ) as a 

SQLFiddle: http://sqlfiddle.com/#!3/412127/8

+2
source share

All Articles