>> print ...">

Python + PostgreSQL + strange error ascii = UTF8

I have ascii strings that contain a character "\x80"to represent the euro symbol:

>>> print "\x80"

When I enter string data containing this character into my database, I get:

psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0x80
HINT:  This error can also happen if the byte sequence does not match the encodi
ng expected by the server, which is controlled by "client_encoding".

I am new to unicode. How to convert strings containing "\x80"a valid UTF-8 containing the same euro symbol? I tried calling .encodeand .decodefor different lines, but came in errors:

>>> "\x80".encode("utf-8")
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    "\x80".encode("utf-8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
+5
source share
1 answer

The question begins with a false premise:

I have ascii lines that contain the character "\ x80" to denote the euro symbol.

ASCII "\ x00" "\ x7F" .

(1), locale == encoding (2), latin1 "\ x80" .

, ISO-8859-x "\ x80" U + 0080, C1, . 3 (x (7, 15, 16)) , "\ xA4". . .

, . ? ? , ( ), .

, " latin1" " " ", ". , , cp125x, Windows. , , cp1251 (Windows Cyrillic), "\ x80" :

>>> ['\x80'.decode('cp125' + str(x), 'replace') for x in range(9)]
[u'\u20ac', u'\u0402', u'\u20ac', u'\u20ac', u'\u20ac', u'\u20ac', u'\u20ac', u'\u20ac', u'\u20ac']

OP

, . open(fname).read(). \x80 , . . , , . ? , , "\ x80" , , cp125x, char .

:

\x80 ,

, , "\ x80"

, .

cp125x: ( ) ? () ? , > "\ x7f"? , ?

2 , , , , "\ x80" . , .

/ / Windows, , cp1252 - ... , ( , ).

+11

All Articles