SQL aggregation function for any non-specific value from a group

Is there an agregate function that returns any value from a group. I could use MIN or MAX , but would rather avoid the overhead, if possible, give it a text box.

My situation is a summary of the error log. Errors are grouped by type of error, and an example of the error text is displayed for each group. It does not matter which error message is used as an example.

 SELECT ref_code, log_type, error_number, COUNT(*) AS count, MIN(data) AS example FROM data GROUP BY ref_code, log_type, error_number 

What can replace MIN(data) with not comparing 100,000s of varchar (2000) values?

+8
sql oracle
source share
3 answers

Following the suggested answers, it seems that MIN(data) (or MAX(data) ) is the fastest way to achieve what I want. I try to unnecessarily optimize unnecessarily.

I will try the other answers that appear when I have access to this database, but at the same time it comes out from above.

Thanks for all the efforts!

+3
source share

you can use MIN with KEEP, for example:

 MIN(data) keep (dense_rank first order by rowid) AS EXAMPLE 

The idea is that the database engine will sort data by ROWID instead of VARCHAR (2000) values, which should theoretically be faster. You can replace the ROWID with the primary key value and check if it is faster

+4
source share

Well, since you asked about TRANSFER AND ORDER, below is the version that executes your GROUP BY, but then also uses ROW_NUMBER () with OVER and PARTITION AND ORDER BY to dial the first combination of ref_code, log_type, error_num as string number 1 (with any data column in 1). Then it renumbers, starting at 1, in the next separate combination of ref_code, log_type, error_num that it finds (with any data column that happens there). Therefore, you can simply pull the data field in line number 1 as a representative data field for a given ref_code, log_type, error_num .

He still lacks nothing. It would be more elegant if I didnโ€™t have a double pass (once for aggregation and once for row_number ()); however, it can work very well. I will have to think about it again to see if I can eliminate the double pass.

But he avoids any comparison of a large data field. And this is a way to do what you asked for: pull 1 representative sample from the data field in correlation with the aggregated fields.

 SELECT t.ref_code, t.log_type, t.error_number, t.count, d.data FROM ( SELECT ref_code, log_type, error_number, COUNT(*) as count FROM data GROUP BY ref_code, log_type, error_number ) t INNER JOIN ( SELECT ref_code, log_type, error_number, data, ROW_NUMBER() OVER ( PARTITION BY ref_code, log_type, error_number ORDER BY ref_code, log_type, error_number ) as row_number FROM data ) d on d.ref_code = t.ref_code and d.log_type = t.log_type and d.error_number = t.error_number and row_number = 1 

Final warning: I do not have Oracle to try this. But I read the Oracle documentation.


I added below, after I thought about how to emulate GROUP BY, which I only had for COUNT (*). I donโ€™t know if it is faster.

 SELECT * FROM ( SELECT ref_code, log_type, error_number, data, ROW_NUMBER() OVER ( PARTITION BY ref_code, log_type, error_number ORDER BY ref_code, log_type, error_number ) as row_number, COUNT(*) OVER ( PARTITION BY ref_code, log_type, error_number ORDER BY ref_code, log_type, error_number ) as count FROM data ) t WHERE row_number = 1 
+2
source share

All Articles