'utf-8' codec cannot decode byte 0x80

Question

'utf-8' codec cannot decode byte 0x80

I am trying to load a model prepared by BVLC and I am stuck with this error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte

I think this is due to the following function ( full code )

  # Closure-d function for checking SHA1.
  def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']):
      with open(filename, 'r') as f:
          return hashlib.sha1(f.read()).hexdigest() == sha1

Any idea how to fix this?

+4

python utf-8 caffe

Ehab albadawy Apr 24 '16 at 16:40

source share

3 answers

, , f.read() UTF-8, , , . , , , , : , .

>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
  File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
    with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
  File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte

>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325

+4

DSM 24 . '16 17:01

src , , , b char ( , ) (tf-: 1.1.0):

image_data = tf.gfile.FastGFile(filename, 'rb').read()

For more information check: gfile

+1

4F2E4A2E May 13, '17 at 10:14

source share

Martijn Pieters · Accepted Answer · 2016-04-24T17:02:08+0000

You open a file that is not UTF-8 encoded, but the default for your system is UTF-8.

Since you are computing the SHA1 hash, you should instead read the data as binary. Functions hashlibrequire passing bytes:

with open(filename, 'rb') as f:
    return hashlib.sha1(f.read()).hexdigest() == sha1

Note the addition bto file mode.

See documentation open():

mode - , . 'r', . [...] , , : locale.getpreferredencoding(False) , . ( .)

hashlib:

( ) update().

'utf-8' codec cannot decode byte 0x80

More articles: