Get non-UTF-8-form fields like UTF-8 in PHP?

I have a form filed in non-UTF-8 (its actually in Windows-1251). People, of course, place any characters they like there. The browser helps to convert non-representable characters in Windows-1251 to html objects so that I can recognize them. For example, if the user types →, I get →. This is partly wonderful, for example, if I just repeat it, the browser will correctly display → no matter what.

The problem is that I actually do htmlspecialchars () in the text before displaying it (its PHP function for converting special characters to HTML objects, for example, becomes &). My users sometimes enter things like —or ©, and I want to display them as relevant —or ©, not - and ©.

I am unable to distinguish → from →, because I get them like →. And, since I have htmlspecialchars () text, and I also get →for → from the browser, I return back →, which is displayed as →in the browser. Thus, user input is corrupted.

Is there a way to say, “Okay, I serve this form on Windows-1251, but could you just send me UTF-8 login and let me handle it myself?”

Oh, I know that it’s a good idea to switch all the software to UTF-8, but this is too much work, and I would be happy to fix it quickly. If that matters, the enctype forms are "multipart / form-data" (including the file loader, so no other enctype can be used). I am using Apache and PHP.

Thanks!

+1
source share
8 answers

Browser helps convert non-excitable characters in Windows-1251 to html objects

Well, almost, unless it’s not at all useful. Now you cannot distinguish the real "& # 411;" that someone typed, expecting it to come out as a line of text with '& in it and the character “B”.

htmlspecialchars()

. , .

, Windows-1251, , , UTF-8 .

, "accept-charset =" UTF-8 " . , IE . UTF-8, () UTF-8.

, - UTF-8,

. , , , , UTF-8.

+3
<form action="action.php" method="get" accept-charset="UTF-8">
    <!-- some elements -->
</form>

, accept-charset.

+1

, . UTF-8, , . , &, #, 8, 5, 9, 4 , .

-, - Windows-1251 UTF-8 script, , , . , , , . - & copy; -, & # 8594; # .

, .

0
0

htmlspecialchars function (double_encode, PHP 5.2.3) false, .

.

0

UTF-8 PHP . , . mb_convert_encoding(), -1251 UTF-8 - -.

, , - , htmlspecialchars(), amp; amp;... html_entity_decode() , .

, , double_encode htmlspecialchars()

0

mbstring HTML- "charset"

for($i=0; $i<strlen($out); $i++) { printf('%02X ', ord($out[$i])); }

61 20 E2 86 92 20 62 20 26 20 63
E2 86 92 - → ( RIGHTWARDS) utf8.
0

, , . - Windows 1251. , - UTF-8, UTF-8, .

0

All Articles