Python + PostgreSQL + strange error ascii = UTF8
I have ascii strings that contain a character "\x80"to represent the euro symbol:
>>> print "\x80"
€
When I enter string data containing this character into my database, I get:
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0x80
HINT: This error can also happen if the byte sequence does not match the encodi
ng expected by the server, which is controlled by "client_encoding".
I am new to unicode. How to convert strings containing "\x80"a valid UTF-8 containing the same euro symbol? I tried calling .encodeand .decodefor different lines, but came in errors:
>>> "\x80".encode("utf-8")
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
"\x80".encode("utf-8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
The question begins with a false premise:
I have ascii lines that contain the character "\ x80" to denote the euro symbol.
ASCII "\ x00" "\ x7F" .
(1), locale == encoding (2), latin1 "\ x80" .
, ISO-8859-x "\ x80" U + 0080, C1, . 3 (x (7, 15, 16)) , "\ xA4". . .
, . ? ? , ( ), .
, " latin1" " " ", ". , , cp125x, Windows. , , cp1251 (Windows Cyrillic), "\ x80" :
>>> ['\x80'.decode('cp125' + str(x), 'replace') for x in range(9)]
[u'\u20ac', u'\u0402', u'\u20ac', u'\u20ac', u'\u20ac', u'\u20ac', u'\u20ac', u'\u20ac', u'\u20ac']
OP
, .
open(fname).read(). \x80 , . . , , . ? , , "\ x80" , , cp125x, char .
:
\x80 ,
, , "\ x80"
, .
cp125x: ( ) ? () ? , > "\ x7f"? , ?
2 , , , , "\ x80" . , .
/ / Windows, , cp1252 - ... , ( , ).