I think encoding / decoding is happening everywhere. You start with a unicode object:
u'\xe4\xf6\xfc'
This is a unicode object, the three characters are the unicode code points for "Àâü". If you want to turn them into Utf-8, you must encode them:
>>> u'\xe4\xf6\xfc'.encode('utf-8') '\xc3\xa4\xc3\xb6\xc3\xbc'
The resulting six characters is the Utf-8 "Àâü" representation.
If you call decode(...) , you are trying to interpret the characters as some encoding that still needs to be converted to unicode. Since it is already Unicode, this does not work. Your first call is trying to convert Ascii to Unicode, the second is converting Utf-8 to Unicode. Since u'\xe4\xf6\xfc' is neither a valid Ascii nor a valid Utf-8, these conversion attempts fail.
Further confusion may arise because '\xe4\xf6\xfc' also the Latin1 / ISO-8859-1 encoding for Àâü. If you write a normal python string (without the leading "u" that marks it as unicode), you can convert it to a unicode object using decode('latin1') :
>>> '\xe4\xf6\xfc'.decode('latin1') u'\xe4\xf6\xfc'
source share