SQL query such as GROUP BY with OR clause

I will try to describe the real situation. In our company we have a reservation system with a table, let’s call it Customers, where the email and phone contacts are saved with each incoming order - which part of the system I can’t change. I ran into a problem how to get the number of unique customers. With a unique client, I mean a group of people who have the same email or the same phone number.

Example 1: From real life, you can imagine that Tom and Sandra are married. The volume that ordered 4 products filled our reservation system with 3 different email addresses and 2 different phone numbers when one of them shares with Sandra (like an intercom), so I can assume that they are somehow connected. Sandra, with the exception of this common telephone number, also filled out her personal, and for both orders she used only one email address. For me, this means taking all of the following lines as a unique client. Thus, in fact, this unique client can grow up in the whole family.

ID E-mail Phone Comment ---- ------------------- -------------- ------------------------------ 0 tom@email.com +44 111 111 First row 1 tommy@email.com +44 111 111 Same phone, different e-mail 2 thomas@email.com +44 111 111 Same phone, different e-mail 3 thomas@email.com +44 222 222 Same e-mail, different phone 4 sandra@email.com +44 222 222 Same phone, different e-mail 5 sandra@email.com +44 333 333 Same e-mail, different phone 

As ypercube said I would probably need recursion to count all of these unique clients.

Example 2: Here is an example of what I want to do.

Is it possible to get the number of unique clients without using recursion, for example, using the cursor or something, or recursion is required?

 ID E-mail Phone Comment ---- ------------------- -------------- ------------------------------ 0 linsey@email.com +44 111 111 ─┐ 1 louise@email.com +44 111 111 β”œβ”€ 1. unique customer 2 louise@email.com +44 222 222 β”€β”˜ ---- ------------------- -------------- ------------------------------ 3 steven@email.com +44 333 333 ─┐ 4 steven@email.com +44 444 444 β”œβ”€ 2. unique customer 5 sandra@email.com +44 444 444 β”€β”˜ ---- ------------------- -------------- ------------------------------ 6 george@email.com +44 555 555 ─── 3. unique customer ---- ------------------- -------------- ------------------------------ 7 xavier@email.com +44 666 666 ─┐ 8 xavier@email.com +44 777 777 β”œβ”€ 4. unique customer 9 xavier@email.com +44 888 888 β”€β”˜ ---- ------------------- -------------- ------------------------------ 10 robert@email.com +44 999 999 ─┐ 11 miriam@email.com +44 999 999 β”œβ”€ 5. unique customer 12 sherry@email.com +44 999 999 β”€β”˜ ---- ------------------- -------------- ------------------------------ ---------------------------------------------------------------------- Result βˆ‘ = 5 unique customers ---------------------------------------------------------------------- 

I tried a query with GROUP BY, but I don't know how to group the result by the first or second column. I'm looking to say something like

 SELECT COUNT(*) FROM Customers GROUP BY Email OR Phone 

Thanks again for any suggestions.

PS I really appreciate the answers to this question before the complete rephrasing. Now the answers here may not correspond to the update, so please do not click here if you intend to do this (except for the question, of course :) . I completely rewrote this post.

Thank you and sorry for the wrong start.

+7
source share
5 answers

Here is a complete solution using recursive CTE.

 ;WITH Nodes AS ( SELECT DENSE_RANK() OVER (ORDER BY Part, PartRank) SetId , [ID] FROM ( SELECT [ID], 1 Part, DENSE_RANK() OVER (ORDER BY [E-mail]) PartRank FROM dbo.Customer UNION ALL SELECT [ID], 2, DENSE_RANK() OVER (ORDER BY Phone) PartRank FROM dbo.Customer ) A ), Links AS ( SELECT DISTINCT A.Id, B.Id LinkedId FROM Nodes A JOIN Nodes B ON B.SetId = A.SetId AND B.Id < A.Id ), Routes AS ( SELECT DISTINCT Id, Id LinkedId FROM dbo.Customer UNION ALL SELECT DISTINCT Id, LinkedId FROM Links UNION ALL SELECT A.Id, B.LinkedId FROM Links A JOIN Routes B ON B.Id = A.LinkedId AND B.LinkedId < A.Id ), TransitiveClosure AS ( SELECT Id, Id LinkedId FROM Links UNION SELECT LinkedId Id, LinkedId FROM Links UNION SELECT Id, LinkedId FROM Routes ), UniqueCustomers AS ( SELECT Id, MIN(LinkedId) UniqueCustomerId FROM TransitiveClosure GROUP BY Id ) SELECT A.Id, A.[E-mail], A.Phone, B.UniqueCustomerId FROM dbo.Customer A JOIN UniqueCustomers B ON B.Id = A.Id 
0
source

Search for groups that have only the same phone:

 SELECT ID , Name , Phone , DENSE_RANK() OVER (ORDER BY Phone) AS GroupPhone FROM MyTable ORDER BY GroupPhone , ID 

Search for groups that have only the same name:

 SELECT ID , Name , Phone , DENSE_RANK() OVER (ORDER BY Name) AS GroupName FROM MyTable ORDER BY GroupName , ID 

Now, for the described (complex) query, let's say we have a table instead:

 ID Name Phone ---- ------------- ------------- 0 Kate +44 333 333 1 Sandra +44 000 000 2 Thomas +44 222 222 3 Robert +44 000 000 4 Thomas +44 444 444 5 George +44 222 222 6 Kate +44 000 000 7 Robert +44 444 444 -------------------------------- 

Should they be in the same group? . Since they all have a name or phone with someone else, forming a "chain" of relative persons:

 0-6 same name 6-1-3 same phone 3-7 same name 7-4 same-phone 4-2 same name 2-5 bame phone 
+1
source

I don't know if this is the best solution, but here it is:

 SELECT MyTable.ID, MyTable.Name, MyTable.Phone, CASE WHEN N.No = 1 AND P.No = 1 THEN 1 WHEN N.No = 1 AND P.No > 1 THEN 2 WHEN N.No > 1 OR P.No > 1 THEN 3 END as GroupRes FROM MyTable JOIN (SELECT Name, count(Name) No FROM MyTable GROUP BY Name) N on MyTable.Name = N.Name JOIN (SELECT Phone, count(Phone) No FROM MyTable GROUP BY Phone) P on MyTable.Phone = P.Phone 

The problem is that there are several unions made on varchars here, and may result in an increase in runtime.

0
source

For a data set in the example, you can write something like the following:

 ;WITH Temp AS ( SELECT Name, Phone, DENSE_RANK() OVER (ORDER BY Name) AS NameGroup, DENSE_RANK() OVER (ORDER BY Phone) AS PhoneGroup FROM MyTable) SELECT MAX(Phone), MAX(Name), COUNT(*) FROM Temp GROUP BY NameGroup, PhoneGroup 
0
source

Here is my solution:

 SELECT p.LastName, P.FirstName, P.HomePhone, CASE WHEN ph.PhoneCount=1 THEN CASE WHEN n.NameCount=1 THEN 'unique name and phone' ELSE 'common name' END ELSE CASE WHEN n.NameCount=1 THEN 'common phone' ELSE 'common phone and name' END END FROM Contacts p INNER JOIN (SELECT HomePhone, count(LastName) as PhoneCount FROM Contacts GROUP BY HomePhone) ph ON ph.HomePhone = p.HomePhone INNER JOIN (SELECT FirstName, count(LastName) as NameCount FROM Contacts GROUP BY FirstName) n ON n.FirstName = p.FirstName LastN FirstN Phone Comment Hoover Brenda 8138282334 unique name and phone Washington Brian 9044563211 common name Roosevelt Brian 7737653279 common name Reagan Charles 7734567869 unique name and phone 
0
source

All Articles