Are you sure you want to encode UTF-8 encoding in Unicode format? Typically, Python stores characters in .UnicodeType types using UCS-2 or -4, which is sometimes called "wide" characters, which should contain characters from all fairly common scripts.
Interestingly, this is a lib, which sometimes outputs .StringType and sometimes types.UnicodeType types. If I accepted the wild assumption, lib always produces type.StringType, but does not tell what encoding it is in. If so, you are really looking for code that can guess which encoding of type .StringType is encoded as.
In most cases, this is easy, as you can assume that it is either, for example, Latin-1 or UTF-8. If the text can really be in any odd encoding (for example, incoming mail without a proper header), you need a library that guesses the encoding. See http://chardet.feedparser.org/ .
Bittrance
source share