Decompress a .bz2 file in Python

So this is a seemingly simple question, but I'm apparently very boring. I have a little script that downloads all .bz2 files from a webpage, but for some reason unpacking this file gives me a MAJOR headache.

I am new to Python, so the answer is probably quite obvious, please help me.

In this bit of the script, I already have the file, and I just want to read it, and then unzip it. It is right? I tried all kinds of ways to do this, I usually get the error "ValueError: cannot find the end of the stream" in the last line of this fragment. I tried to open the zip file and write it to a string in different ways. This is the last.

openZip = open(zipFile, "r")
s = ''
while True:
    newLine = openZip.readline()
    if(len(newLine)==0):
       break
    s+=newLine
    print s                   
    uncompressedData = bz2.decompress(s)

Hi Alex, I had to list all the other methods that I tried, since I tried reading ().

METHOD A:

print 'decompressing ' + filename

fileHandle = open(zipFile)
uncompressedData = ''

while True:            
    s = fileHandle.read(1024)
    if not s:
        break
        print('RAW "%s"', s)
        uncompressedData += bz2.decompress(s)

        uncompressedData += bz2.flush()

        newFile = open(steamTF2mapdir + filename.split(".bz2")[0],"w")
        newFile.write(uncompressedData)
        newFile.close()   

:

uncompressedData += bz2.decompress(s)
ValueError: couldn't find end of stream

B

zipFile = steamTF2mapdir + filename
print 'decompressing ' + filename
fileHandle = open(zipFile)

s = fileHandle.read()
uncompressedData = bz2.decompress(s)

:

uncompressedData = bz2.decompress(s)
ValueError: couldn't find end of stream

. , , .bz2.

7zip, , , - , .

+5
3

, , . DO NOT! .

uncompressedData = bz2.BZ2File(zipFile).read()

, , .

: OP , ( , - , !), , , :

... , ... .

open(filename) open(filename, 'r') - , , , open(filename, 'rb'). (( bz2.BZ2File , , , )).

Python 2.*, Unix-y (.. , Windows), open ( Python 3.* , Unicode, - - ).

Windows ( DOS) , Windows ( , , , , , , '\0x1A' ), .

, OP Windows , 'rb' ( "read binary" ) open. ( bz2.BZ2File , , !).

+14

openZip = open (zipFile, "r" )

Windows, , openZip = open (zipFile, "rb" ) , , , CR/LF, , .

newLine = openZip.readline()

, , "" .

s = fileHandle.read(1024) [...] uncompressedData + = bz2.decompress(s)

. 1024- , .

s = fileHandle.read() uncompressedData = bz2.decompress(s)

, , , .

+9

It was very helpful. 44 out of 2300 files gave an end to the missing file error in Windows open. Adding the b (inary) flag to open the fixed problem.

for line in bz2.BZ2File(filename, 'rb', 10000000) :

works good. (10M is a buffering size that works well with large files)

Thanks!

+5
source

All Articles