I DO NOT want to check if a string is in Python in ASCII. :)
There is an interesting requirement in the HTTP specification , and I was wondering how this can be implemented and tested.
Recipients MUST parse the HTTP message as an encoding octet sequence, which is a superset of US-ASCII [USASCII].
Parsing an HTTP message as a Unicode character stream without regard to a particular encoding creates security vulnerabilities due to the different ways that string processing libraries handle invalid multibyte character sequences that contain the LF octet (% x0A).
In another, https://stackoverflow.com/a/166269/2123/2128, there is an example character set that is not a superset of US-ASCII. But I was more interested in testing this requirement. OR kind of testing. The requirement simply means that the analyzer must pick up a superset of ASCII to swallow the data, but I was wondering in which case you want to check before there are any strange characters inside the message.
Say the message is MSG .
def is_ascii_superset(self, MSG): "take any string, and return True or False"
Any ideas if there is a list of all character sets that are superset of ASCII?
UPDATE :
People seem to misunderstand this question. I am not saying that the string is part of ASCII. This is trivial.
- ISO-8859-1, UTF-8, etc. are supersets of ASCII.
- JIS X 0208 is NOT a superset of ASCII.
source share