Do you define a data type based solely on how the data type performs in GROUP BY ? This is the same data, you just decide how to store 123456, like INT or VARCHAR ? Have you considered other factors, such as the cost of the CPU to convert between numeric and string types, when this might not have been necessary? Additional memory needed to store the entire table in the cache? Overhead string for VARCHAR indicating length? As for storage costs (for example, 1234567890 takes 4 bytes as INT , but "1234567890" takes 10 bytes + line overhead like VARCHAR )? What about compression? How will the index in this column align with the clustered index in the table, which can affect how useful the "already grouped" ones are?
In other words, I would not consider GROUP BY performance in a bubble.
Aaron bertrand
source share