Compare the same values ​​stored with different encodings

This question is not a duplicate of string matching PHP between two different types of coding , because my question requires a SQL solution, not a PHP solution.




Context ► There is a museum with two databases with the same encoding and sorting ( engine=INNODB charset=utf8 collate=utf8_unicode_ci ), which are used by two different PHP systems. Each PHP system stores the same data in different , the following image is an example:

enter image description here

Tons of data has already been saved this way, and both systems are working fine, so I can’t change the PHP encoding or the database. One system processes sales from the cash desk, and the other processes sales from the website.

Problem . I need to compare the right column ( tipo_boleto_tipo ) with the left column ( tipo ) to get the value in another column of the left table (invisible in the image), but I do not get any results because the same values ​​are stored differently, for example, when I looking for "Niños", it was not found because it was saved as "Ni? os" ("children" in Spanish). I tried to do this through PHP using utf8_encode and utf8_decode , but this is unacceptably slow, so I think it's best to do this only with SQL. This data will be used for a unified sales report (at the box office and the Internet) in variable periods of time, and it must compare hundreds of thousands of lines, so it is so slow with PHP.

Question ► Is there something like utf8_encode or utf8_decode in MYSQL that allows me to match values equivalent for both columns? Any other suggestion would be welcome.

The following is my current code (no results):

  DATABASE TABLE COLUMN ▼ ▼ ▼ SELECT boleteria.tipos_boletos.genero ◄ DESIRED COLUMN. FROM boleteria.tipos_boletos ◄ DATABASE WITH WEIRD CHARS. INNER JOIN venta_en_linea.ventas_detalle ◄ DATABASE WITH PROPER CHARS. ON venta_en_linea.ventas_detalle.tipo_boleto_tipo = boleteria.tipos_boletos.tipo WHERE venta_en_linea.ventas_detalle.evento_id='1' AND venta_en_linea.ventas_detalle.tipo_boleto_tipo = 'Niños' 

The ON venta_en_linea.ventas_detalle.tipo_boleto_tipo = boleteria.tipos_boletos.tipo line ON venta_en_linea.ventas_detalle.tipo_boleto_tipo = boleteria.tipos_boletos.tipo will never work because both values ​​are different ("Niños" vs "Nià ± os").

+6
php mysql
Sep 11 '17 at 16:08 on
source share
1 answer

It looks like the application that writes to the boleteria database boleteria not store the correct UTF-8. The database column character set refers to how MySQL interprets rows, but your application can still write in other character sets.

I can’t tell from your example what exactly the wrong character set is, but assuming that it is Latin-1, you can convert it to latin1 (to make it “correct”), and then translate it back to “actual” utf8:

 SELECT 1 FROM tipos_boletos, ventas_detalle WHERE CONVERT(CAST(CONVERT(tipo USING latin1) AS binary) USING utf8) = tipo_boleto_tipo COLLATE utf8_unicode_ci 

I have seen this all too often in PHP applications that were not written from the very beginning to use UTF-8 strings. If you find that the performance is too slow, and you need to convert often, and you don’t have the ability to update the application that writes the data incorrectly, you can add a new column and run tipos_boletos in the table and convert on the fly as records are added or edited.

+3
11 Sept. '17 at 19:10
source share



All Articles