UnicodeEncodeError: ascii codec cannot encode u '\ xef' character at position 0: serial number not in range (128)

I want to parse my XML document. So I saved my XML document below

class XMLdocs(db.Expando): id = db.IntegerProperty() name=db.StringProperty() content=db.BlobProperty() 

Now my below is my code

 parser = make_parser() curHandler = BasketBallHandler() parser.setContentHandler(curHandler) for q in XMLdocs.all(): parser.parse(StringIO.StringIO(q.content)) 

I get below the error

 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128) Traceback (most recent call last): File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 517, in __call__ handler.post(*groups) File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/base_handler.py", line 59, in post self.handle() File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 168, in handle scan_aborted = not self.process_entity(entity, ctx) File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 233, in process_entity handler(entity) File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 71, in process parser.parse(StringIO.StringIO(q.content)) File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 207, in feed self._parser.Parse(data, isFinal) File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 136, in characters print ch UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128) 
+73
python google-app-engine xml-parsing
Feb 28 '11 at 11:48
source share
7 answers

It seems you fall into the byte character of UTF-8 (BOM). Try using this unicode string with the extracted specification:

 import codecs content = unicode(q.content.strip(codecs.BOM_UTF8), 'utf-8') parser.parse(StringIO.StringIO(content)) 

I used strip instead of lstrip because in your case you had several occurrences of the specification, possibly due to the concatenated contents of the file.

+30
Feb 28 '11 at 11:59
source share

The actual best answer for this problem depends on your environment, in particular on what encoding your terminal expects.

The fastest single-line solution is to encode everything you type in ASCII, which your terminal will almost certainly accept when discarding characters that you cannot print:

 print ch #fails print ch.encode('ascii', 'ignore') 

The best solution is to change the terminal encoding to utf-8 and encode everything as utf-8 before printing. You should get used to thinking about Unicode EVERY time you type or read a line.

+112
Feb 28 '11 at 19:59
source share

Just placing .encode('utf-8') at the end of the object will do the job in recent versions of Python.

+56
Aug 16 '13 at 8:23
source share

This worked for me:

 from django.utils.encoding import smart_str content = smart_str(content) 
+30
Oct 17 '11 at
source share

The problem with your trace is the print statement on line 136 parseXML.py . Unfortunately, you did not find it necessary to publish this part of your code, but I'm going to guess that it is just for debugging. If you change it to:

 print repr(ch) 

then you should at least see what you are trying to print.

+7
Feb 28 '11 at 12:23
source share

The problem is that you are trying to print a Unicode character on a possibly non-unicode terminal. You need to encode it with the 'replace parameter before printing it, for example. print ch.encode(sys.stdout.encoding, 'replace') .

+7
Feb 28 2018-11-28T00:
source share

A simple solution to this problem is to set the default encoding for utf8. Follow the example

 import sys reload(sys) sys.setdefaultencoding('utf8') 
-one
Feb 09 '17 at 6:56 on
source share



All Articles