Double-byte character encoded with one byte character (ISO-8859-1) HTML document

Question

Double-byte character encoded with one byte character (ISO-8859-1) HTML document

I found out that ISO-8859-1 is a single-byte encoding.

See the page http://www.manoramaonline.com/cgi-bin/MMOnline.dll/portal/ep/malayalamContentView.do?tabId=11&programId=1073753760& BV_ID = @ @@ & contentId = 15238737 & contentType = EDITORIAL & articleType = Malayalam% 20 . He uses the Malayalam language.

The HTTP header and meta tag say it uses ISO-8859-1 as the character encoding.

But this page uses a double-byte character (0x201A) ( http://unicodelookup.com/#%E2%80%9A ).

enter image description here

(copy the character and find it at http://unicodelookup.com )

<div id="articleTitleMal" style="padding-top:10px;">
    <font face= "Manorama" >
         ¼ÈØOVA¢: ÜÍß‚Äí 1.28 ...
    </font>
 </div>

How can I use a double-byte character in single-byte encoding?

My not curiosity to know this. One of my tasks was stuck due to a misunderstanding of the above problem.

Update: they use the font www.manoramaonline.com/portal/mmcss/Manorama.ttf, and I think some of the characters in the Manaorama font use two bytes.

UPDATE2: I tried to convert a document from ISO-8859-1 to UTF-8 using the code below.

<?php
$t = file_get_contents('http://www.manoramaonline.com/cgi-bin/MMOnline.dll/portal/ep/malayalamContentView.do?tabId=11&programId=1073753760&BV_ID=@@@&contentId=15238737&contentType=EDITORIAL&articleType=Malayalam%20News');

// Change the charset info in meta-tag
$t  = str_replace('ISO-8859-1', 'UTF-8', $t);

file_put_contents('t.html', utf8_encode($t));

This time, the above selected character is missing.

+4

html php character-encoding true-type-fonts

Habeeb perwad Oct 17 '13 at 8:21

source share

1 answer

Jukka K. Korpela · Accepted Answer · 2013-10-17T11:05:50+0000

ISO-8859-1, HTTP-, Windows-1252. , , , WHATWG Encoding Standard.

, 82 (), ( ISO 8859-1), U + 201A "," ( Windows-1252).

, . ( , . .) , U + 201A ",", 82, .

, , . UTF-8 .

Unicode, .

Double-byte character encoded with one byte character (ISO-8859-1) HTML document

More articles: