Why can I use a character that is not part of the encoding (windows-1252)?

I am looking for a little help in understanding how encodings work. This question is a continuation of Something Wrong Using Windows-1252 instead of UTF-8

I have a ColdFusion test site using ...

<CFHEADER NAME="Content-Type" value="text/html; charset=windows-1252"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252" /> 

and test Oracle DB using ...

 NLS_CHARACTERSET: WE8MSWIN1252 NLS_NCHAR_CHARACTERSET: AL16UTF16 

According to the encoding windows-1252 there is no square root symbol (alt + 251): โˆš But I can enter this into the field in the form of a web page, save it in the database, request it and show it on the screen again just fine. When it is stored in the database as: &#8730; . How can I enter it, save, query and show it, if it is not even part of the character set? According to the encoding, the decimal value of 251 is: Hex:FB | รป | 00FB | LATIN SMALL LETTER U WITH CIRCUMFLEX Hex:FB | รป | 00FB | LATIN SMALL LETTER U WITH CIRCUMFLEX

+1
source share
1 answer

In fact, you do not use characters outside the encoding of the page and database.

Since the page is encoded by Windows-1252, if you enter Alt + 251 in the form field and then publish the data, the browser says:

 "Hey this char is not apart of windows-1252 and I need to only send back data which is in windows-1252, so I will do the best I can and send back the html character code of char &#8730; -- oh well, I wish I could send back 1 character, since I cannot I will send back 7." 

And if you notice, these are 7 different characters that are encoded in windows-1252.

If the page has been encoded in multibyte encoding, the browser will send back something that counts as 1 character.

So how can you request it?

  select * from tab where field like '%&#8730;%' 

You have the html symbol of the square root symbol: https://www.google.com/#q=html+character+codes

Update:

Here is a very good article explaining what happens: http://htmlpurifier.org/docs/enduser-utf8.html

  "...once you start adding characters outside of your encoding... [the browser might] replace the character with a character entity reference...." 

Also, when you type Alt + 251 on a Windows machine, it inserts the square root character, which is in Unicode U-221A.

Pressing the Alt + 251 key just looks like a keyboard macro to insert the Unicode U-221A.

+3
source

All Articles