Adding the utf-8 parameter to the htmlspecialchars () entry - can it break something?

Assuming my project is utf-8 everywhere and always used with utf-8 encoded, is there anything legal that could break if I change all occurrences of htmlspecialchars($var) to htmlspecialchars($var, ENT_QUOTES, 'utf-8')

I know one thing: obviously, ENT_QUOTES differs from ENT_COMPAT in that it also avoids single quotes. Assuming I know that this alone won't break anything, is there anything else?

In other words:

Is there a possible result of htmlspecialchars () when used without the charset parameter, data only from the encoding, which will differ from htmlspecialchars () when used with the charset parameter?

( htmlspecialchars($stringThatIsValidUTF8, ENT_QUOTES) !== htmlspecialchars($stringThatIsValidUTF8, ENT_QUOTES, 'utf-8') , htmlspecialchars($stringThatIsValidUTF8, ENT_QUOTES) !== htmlspecialchars($stringThatIsValidUTF8, ENT_QUOTES, 'utf-8') ?)

My understanding says no, never. Stack Overflow So far, looking at my project sandbox with the changes also says no. However, I am not sure if I am missing something.

+4
source share
2 answers

I think the quote from the PHP manual in another question unambiguously answers:

For the purpose of this function, the encodings ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252 and KOI8-R are practically equivalent, since the characters affected by htmlspecialchars () occupy the same positions in all of these encodings.

" & > etc. everyone has the same code in each of these encodings, and even in UTF-8 they only need one byte, because the UTF-8 character will occupy several bytes only when necessary. Therefore, even if you have processed UTF-8 data with ISO-8859-1 so far, the output will be identical when switching to explicit input of UTF-8.

+5
source

No, this will not be different, because if you did not provide any encoding, PHP will guess, so it will use UTF-8.

-one
source

All Articles