Unfortunately, I had to resort to such nonsense before MySQL. If you can’t just pull the index and GROUP BY isn’t faster (I don’t know why this would come from the @Ben .. post), you could try to segment the problem so that its package.
I would still be working in MySQL, most likely it will be faster than everything that you write yourself or run on the UNIX command line. Treat it like you would a materialized table of representations or aggregations in a DW. One simple way would be to create a batch script package that would SELECT DISTINCTS over small ranges into a second table with separate values (via MERGE or some other mechanism). This is more downloadable, but you ran into the same performance problems as in different places. You will have to experiment with the parameters (batch size). If you use this in a production environment and people expect to get all the different values, as if they were querying directly in the database, it would be better to have 3 tables, source, temporary for the current batch, and the current table with the latest values and the date_modified column.
source share