Unicode error while extracting zipfile

I have a little script that extracts a .zip file. This works well, but only for .zip files that do not contain files with letters like "รค", "รถ", "รผ" (etc.) in the file names. Otherwise, I get this error:

Exception in thread Thread-1: Traceback (most recent call last): File "threading.pyc", line 552, in __bootstrap_inner File "install.py", line 92, in run File "zipfile.pyc", line 962, in extractall File "zipfile.pyc", line 950, in extract File "zipfile.pyc", line 979, in _extract_member File "ntpath.pyc", line 108, in join UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 32: ordinal not in range(128) 

Here is the extracting part of my script:

 zip = zipfile.ZipFile(path1) zip.extractall(path2) 

How can i solve this?

+4
source share
3 answers

one suggestion:

I get an error when I do this:

 >>> c = chr(129) >>> c + u'2' Traceback (most recent call last): File "<pyshell#21>", line 1, in <module> c + u'2' UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 0: ordinal not in range(128) 

There is a unicode string passed in for the connection somewhere.

Can the path to the zip file be encoded in Unicode? What to do if you do this:

 zip = zipfile.ZipFile(str(path1)) zip.extractall(str(path2)) 

or that:

 zip = zipfile.ZipFile(unicode(path1)) zip.extractall(unicode(path2)) 

This is line 128 in ntpath:

 def join(a, *p): # 63 for b in p: # 68 path += "\\" + b # 128 

Second sentence:

 from ntpath import * def join(a, *p): """Join two or more pathname components, inserting "\\" as needed. If any component is an absolute path, all previous path components will be discarded.""" path = a for b in p: b_wins = 0 # set to 1 iff b makes path irrelevant if path == "": b_wins = 1 elif isabs(b): # This probably wipes out path so far. However, it more # complicated if path begins with a drive letter: # 1. join('c:', '/a') == 'c:/a' # 2. join('c:/', '/a') == 'c:/a' # But # 3. join('c:/a', '/b') == '/b' # 4. join('c:', 'd:/') = 'd:/' # 5. join('c:/', 'd:/') = 'd:/' if path[1:2] != ":" or b[1:2] == ":": # Path doesn't start with a drive letter, or cases 4 and 5. b_wins = 1 # Else path has a drive letter, and b doesn't but is absolute. elif len(path) > 3 or (len(path) == 3 and path[-1] not in "/\\"): # case 3 b_wins = 1 if b_wins: path = b else: # Join, and ensure there a separator. assert len(path) > 0 if path[-1] in "/\\": if b and b[0] in "/\\": path += b[1:] else: path += b elif path[-1] == ":": path += b elif b: if b[0] in "/\\": path += b else: # !!! modify the next line so it works !!! path += "\\" + b else: # path is not empty and does not end with a backslash, # but b is empty; since, eg, split('a/') produces # ('a', ''), it best if join() adds a backslash in # this case. path += '\\' return path import ntpath ntpath.join = join 
+4
source

For a portable reason, perhaps you will archive files from Windows and extract them to Linux, you can convert the entire file path to unicode in a zip file, when extracting from zip, do not use ZipFile.extractall , this default extract file to disk and not support unicode path in zipped file, try the following:

 import zipfile, sys, os, zf = zipfile.ZipFile(sys.argv[1], 'r') for m in zf.infolist(): data = zf.read(m) # extract zipped data into memory # convert unicode file path to utf8 disk_file_name = m.filename.encode('utf8') dir_name = os.path.dirname(disk_file_name) try: os.makedirs(dir_name) except OSError as e: if e.errno == os.errno.EEXIST: pass else: raise except Exception as e: raise with open(disk_file_name, 'wb') as fd: fd.write(data) zf.close() 
+1
source

Clear as water: A message indicates that the ASCII decoder cannot process characters other than ASCII. You must choose a different character encoding.

-1
source

All Articles