How to normalize encoding names, for example, ks_c_5601-1987 - CP949?

Question

How to normalize encoding names, for example, ks_c_5601-1987 - CP949?

I receive emails from the mail server and convert the message to UTF-8 encoding and save it in DB. To convert the encoding, I use mb_convert_encoding, but cannot convert gb2312 and ks_c_5601-1987 . In googling, I found that instead of gb2312 I can use CP936 and for ks_c_5601-1987 use CP949 .

Following the approach above, this would mean maintaining a separate list of encoding mappings in my code. Is there a way to normalize encoding names for names internally supported by PHP, and therefore eliminate the need to save any map locally?

+7

php utf-8

Nidhi kaushal Dec 10 '12 at 9:49

source share

1 answer

borrible · Answer 1 · 2012-12-10T14:03:38+0000

According to the list of supported character encodings, there are only a small number of encodings explicitly indicated on the code page. Given the small number of these cases - in the absence of built-in normalization on demand - the list of comparisons may not be too inappropriate.

The relevant ones are as follows (the lower name on the right is the name you will need to convert):

CP932 shift_jis
CP51932 euc_jp
CP50220 iso-2022-jp
CP50221 csISO220JP
CP50222 iso-2022-jp
CP936 gb2312
CP950 big5

The code page in the PHP documentation is listed below, but it seems to already have the corresponding synonyms:

CP866 (IBM866)
UHC (CP949)
Windows-1251 (CP1251)
Windows-1252 (CP1252)

How to normalize encoding names, for example, ks_c_5601-1987 - CP949?

More articles: