I would like to determine the encoding of some text (using PHP). For this purpose I use the mb_detect_encoding () function.
The problem is that the function returns different results if I change the order of the possible encodings using the mb_detect_order () function.
Consider the following example.
$html = <<< STR ちょっとのアクセスで落ちてしまったり、サーバー障害が多いレンタルサーバーを選ぶとあなたのビジネス等にかなりの影響がでてしまう可能性があります。特に商売をされている個人の方、法人の方は気をつけるようにしてください STR; mb_detect_order(array('UTF-8','EUC-JP', 'SJIS', 'eucJP-win', 'SJIS-win', 'JIS', 'ISO-2022-JP','ISO-8859-1','ISO-8859-2')); $originalEncoding = mb_detect_encoding($str); die($originalEncoding); // $originalEncoding = 'UTF-8'
However, if you change the encoding order in mb_detect_order (), the results will be different:
mb_detect_order(array('EUC-JP','UTF-8', 'SJIS', 'eucJP-win', 'SJIS-win', 'JIS', 'ISO-2022-JP','ISO-8859-1','ISO-8859-2')); die($originalEncoding); // $originalEncoding = 'EUC-JP'
So my questions are:
Why is this happening?
Is there a way in PHP to correctly and unequivocally detect text encoding?
php encoding
Termos
source share