Despite the fact that we found a huge number of questions on the same problem ( 1 , 2 , 3 , 4 ), I did not even find an answer that took into account the effectiveness, even here.
Although several working solutions have already been proposed, I would like to consider the issue of effectiveness.
EDIT: Thanks to Manatax for pointing out that option 1 does not suffer from performance issues.
Using options 1 and 2, as well as the COLLATE method, can lead to a potential bottleneck, because any index defined in the column will not be used, which will lead to a full scan .
Despite the fact that I have not tried option 3, I suspect that it will suffer the same consequences as options 1 and 2.
Finally, option 4 is the best option for very large tables when it is viable. I mean, there is no other use that relies on the original sort.
Consider this simplified query:
SELECT * FROM schema1.table1 AS T1 LEFT JOIN schema2.table2 AS T2 ON T2.CUI = T1.CUI WHERE T1.cui IN ('C0271662' , 'C2919021') ;
In my original example, I had many other joins. Of course, table1 and table2 have different comparisons. Using the sort operator to create this will result in indexes not being used.
See explanations in the figure below.
Visual Query Explanation when using COLLATE
Alternatively, option 4 can take advantage of a possible index and lead to quick queries.
In the figure below, you can see the same query that is launched after applying Option 4, also changing the sorting of the schema / table / column.
Visual Query Explanation after changing sorting and therefore without sorting
In conclusion, if performance is important and you can change the sorting of the table, go to option 4. If you need to act in a single column, you can use something like this:
ALTER TABLE schema1.table1 MODIFY 'field' VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;