How to normalize encoding names, for example, ks_c_5601-1987 - CP949?

I receive emails from the mail server and convert the message to UTF-8 encoding and save it in DB. To convert the encoding, I use mb_convert_encoding, but cannot convert gb2312 and ks_c_5601-1987 . In googling, I found that instead of gb2312 I can use CP936 and for ks_c_5601-1987 use CP949 .

Following the approach above, this would mean maintaining a separate list of encoding mappings in my code. Is there a way to normalize encoding names for names internally supported by PHP, and therefore eliminate the need to save any map locally?

+7
source share
1 answer

According to the list of supported character encodings, there are only a small number of encodings explicitly indicated on the code page. Given the small number of these cases - in the absence of built-in normalization on demand - the list of comparisons may not be too inappropriate.

The relevant ones are as follows (the lower name on the right is the name you will need to convert):

  • CP932 shift_jis
  • CP51932 euc_jp
  • CP50220 iso-2022-jp
  • CP50221 csISO220JP
  • CP50222 iso-2022-jp
  • CP936 gb2312
  • CP950 big5

The code page in the PHP documentation is listed below, but it seems to already have the corresponding synonyms:

  • CP866 (IBM866)
  • UHC (CP949)
  • Windows-1251 (CP1251)
  • Windows-1252 (CP1252)
+2
source

All Articles