I have a Python 2.7 program that reads iOS text messages from a SQLite database. Text messages are unicode strings. In the following text message:
u'that\u2019s \U0001f63b'
The apostrophe is presented \u2019 , but the emoji is represented \U0001f63b . I was looking for the code for the emoji in question and it is \uf63b . I'm not sure where 0001 comes from. I don't know much about character encoding.
When I type text, character by character, using:
s = u'that\u2019s \U0001f63b' for c in s: print c.encode('unicode_escape')
The program produces the following output:
t h a t \u2019 s \ud83d \ude3b
How can I read these last characters correctly in Python? Am I using encoding correctly here? Should I just try to destroy these 0001 before reading, or is there an easier, less stupid way?
Andrew LaPrise
source share