Checking the health of messages with characters that are not part of any ascii supernets (for example: JIS X 0208)?

I DO NOT want to check if a string is in Python in ASCII. :)

There is an interesting requirement in the HTTP specification , and I was wondering how this can be implemented and tested.

Recipients MUST parse the HTTP message as an encoding octet sequence, which is a superset of US-ASCII [USASCII].

Parsing an HTTP message as a Unicode character stream without regard to a particular encoding creates security vulnerabilities due to the different ways that string processing libraries handle invalid multibyte character sequences that contain the LF octet (% x0A).

In another, https://stackoverflow.com/a/166269/2123/2128, there is an example character set that is not a superset of US-ASCII. But I was more interested in testing this requirement. OR kind of testing. The requirement simply means that the analyzer must pick up a superset of ASCII to swallow the data, but I was wondering in which case you want to check before there are any strange characters inside the message.

Say the message is MSG .

 def is_ascii_superset(self, MSG): "take any string, and return True or False" # Test here if test(MSG): return True else: return False 

Any ideas if there is a list of all character sets that are superset of ASCII?

UPDATE :

People seem to misunderstand this question. I am not saying that the string is part of ASCII. This is trivial.

  • ISO-8859-1, UTF-8, etc. are supersets of ASCII.
  • JIS X 0208 is NOT a superset of ASCII.
+4
source share
1 answer

You do not need to check this, you just treat everything like an ASCII supernet, for example. always refer to %x0A as LF , suppose characters below %x7F are ASCII, and don't try to parse multibyte sequences. A superset of ASCII uses each byte value, there are no "strange" characters.

+2
source

All Articles