Specify a Unicode string containing one character:
symbol = u'★'
It can be converted to HTML syntax as follows:
html = '&#{};'.format(ord(symbol))
To convert back, extract the number by separating and ; convert to integer and then use chr (Python 3) or unichr (Python 2).
If you need to process input not from the conversion above, you may need to process hexadecimal, which looks like &#xZZZ; where ZZZ is a bunch of hexadecimal digits. To detect them, just notice that it starts with x ; analyze the remainder with radix 16.
In addition, you may need to deal with named objects. See the last two paragraphs for this.
If you want Python to deal with the encoding of an entire string, you can use this:
text = u"I like symb★ls!" html = text.encode('ascii', errors='xmlcharrefreplace').decode('ascii')
Unfortunately, there is no equivalent for decoding, and this also does not exclude potentially dangerous HTML characters such as < (which may or may not be what you want). If you need to decode, perhaps use the correct HTML parser, which can also deal with named objects such as ♣ (& clubs;).
If you want to deal with named objects and don't want to use real HTML parser, there is a machine-readable (with Python json module) list of objects .
source share