Problems with a complete understanding of GROUP BY

I am going to discuss some practical questions for the exam that I came up with and I have a problem that fully understands the group. I see GROUP BY as follows: group the result set into one or more columns.

I have the following database schema

enter image description here

enter image description here

My request

SELECT orders.customer_numb, sum(order_lines.cost_line), customers.customer_first_name, customers.customer_last_name FROM orders INNER JOIN customers ON customers.customer_numb = orders.customer_numb INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb GROUP BY orders.customer_numb, order_lines.cost_line, customers.customer_first_name, customers.customer_last_name ORDER BY order_lines.cost_line DESC 

What I'm trying to understand
Why can't I just use only GROUP BY orders.cost_line and group the data by cost_line?

What am i trying to achieve
I would like to get the name of the client who spent the most money. I just donโ€™t quite understand how to achieve this. I understand how connections work, I just canโ€™t understand why I canโ€™t just GROUP BY customer_numb and cost_line (using the amount () used to calculate the amount spent). It seems that I always get "not a GROUP BY expression" if someone can explain what I'm doing wrong (not just give me an answer), that would be great - I would really appreciate it and, of course, for any resources, which you must use GROUP correctly.

Sorry for the long essay, and if I missed something, I apologize. Any help would be greatly appreciated.

+6
source share
2 answers

I just can't understand why I can't just GROUP BY customer_numb and cost_line (with the amount () used to calculate the amount spent).

When you say group by customer_numb , you know that customer_numb uniquely identifies the row in the client table (assuming client_numb is either the primary or alternative key), so any given customers.customer_numb will have one and only one value for customers.customer_first_name and customers.customer_last_name . But in parsing time, Oracle does not know, or at least acts as if it does not. And he panics a bit: "What should I do if one customer_numb has more than one value for customer_first_name ?"

As a rule of thumb, expressions in a select clause can use expressions in a group by clause and / or use aggregate functions. (As well as constants and system variables that are independent of base tables, etc.). And by "use" I mean an expression or part of an expression. Therefore, when you group a first and last name, customer_first_name || customer_last_name customer_first_name || customer_last_name will also be correct.

When you have a table, such as customers , and are grouped using a primary key or a column with a unique key, and not with a null constraint, you can safely include them in the group by clause. In this particular case, group by customer.customer_numb, customer.customer_first_name, customer.customer_last_name.

Also note that order by in the first request will not work, since order_lines.cost_line does not have a single value for the group. You can order on sum(order_lines.cost_line) or use the column alias in the select clause and order on this alias

 SELECT orders.customer_numb, sum(order_lines.cost_line), customers.customer_first_name, customers.customer_last_name FROM orders INNER JOIN customers ON customers.customer_numb = orders.customer_numb INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb GROUP BY orders.customer_numb, customers.customer_first_name, customers.customer_last_name ORDER BY sum(order_lines.cost_line) 

or

 SELECT orders.customer_numb, sum(order_lines.cost_line) as sum_cost_line, . . . ORDER BY sum_cost_line 

Note. I heard that some RDBMS will imply additional expressions to group without explicitly specifying them. Oracle is not one of those RDBMS.

As for grouping both customer_numb and cost_line consider a database with two customers: 1 and 2 with two orders on the same line:

 Customer Number | Cost Line 1 | 20.00 1 | 20.00 2 | 35.00 2 | 30.00 select customer_number, cost_line, sum(cost_line) FROM ... group by customer_number, cost_line order by sum(cost_line) desc Customer Number | Cost Line | sum(cost_line) 1 | 20.00 | 40.00 2 | 35.00 | 35.00 2 | 30.00 | 30.00 

The first line with the highest sum(cost_line) not the client who spent the most.

+4
source

I understand how associations work, I just canโ€™t seem why I canโ€™t just GROUP BY customer_numb and cost_line (with the sum () used to calculate the amount spent).

This should give you the amount for each client.

 SELECT orders.customer_numb, sum(order_lines.cost_line) FROM orders INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb GROUP BY orders.customer_numb 

Note that each column in the SELECT clause that is not an argument for an aggregate function is also a column in the GROUP BY clause.

Now you can join other tables to get more detailed information. Here, one way uses a common table expression. (There are other ways to express what you want.)

 with customer_sums as ( -- We give the columns useful aliases here. SELECT orders.customer_numb as customer_numb, sum(order_lines.cost_line) as total_orders FROM orders INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb GROUP BY orders.customer_numb ) select c.customer_numb, c.customer_first_name, c.customer_last_name, cs.total_orders from customers c inner join customer_sums cs on cs.customer_numb = c.customer_numb order by cs.total_orders desc 

Why can't I just use the GROUP BY orders.cost_line command and group the data by cost_line?

Applying GROUP BY to order_lines.cost_line will give you one row for each individual value in order_lines.cost_line. (The orders.cost_line column does not exist.) This is how this data looks.

 OL.ORDER_NUMB OL.COST_LINE O.CUSTOMER_NUMB C.CUSTOMER_FIRST_NAME C.CUSTOMER_LAST_NAME -- 1 1.45 2014 Julio Savell 1 2.33 2014 Julio Savell 1 1.45 2014 Julio Savell 2 1.45 2014 Julio Savell 2 1.45 2014 Julio Savell 3 13.00 2014 Julio Savell 

You can group order_lines.cost_line, but this will not give you any useful information. This request

 select order_lines.cost_line, orders.customer_numb from order_lines inner join orders on orders.customer_numb = order_lines.customer_numb group by order_lines.cost_line; 

should return something like this.

 OL.COST_LINE O.CUSTOMER_NUMB -- 1.45 2014 2.33 2014 13.00 2014 

Not very helpful.

If you are interested in the sum of the order items, you need to decide which column or columns should be grouped (summarized). If you group (summarize) by order number, you will receive three lines. If you group (summarize) by customer number, you will receive one line.

+2
source

All Articles