Tarfile in Python: can I figure it out more efficiently by extracting only some data?

I order huge scenes with a bunch of landscapes from USGS, which are included in the tar.gz archive. I am writing a simple python script to unzip them. Each archive contains 15 TIFF images ranging in size from 60-120 mb, just a little over 2 GB. I can easily extract the entire archive with the following code:

import tarfile
fileName = "LT50250232011160-SC20140922132408.tar.gz"
tfile = tarfile.open(fileName, 'r:gz')
tfile.extractall("newfolder/")

I actually need 6 of the 15 tiffs designated as β€œstripes” in the title. These are some of the larger files, so together they make up about half the data. Therefore, I decided to speed up this process by changing the code as follows:

fileName = "LT50250232011160-SC20140922132408.tar.gz"
tfile = tarfile.open(fileName, 'r:gz')
membersList = tfile.getmembers()
namesList = tfile.getnames()
bandsList = [x for x, y in zip(membersList, namesList) if "band" in y]
print("extracting...")
tfile.extractall("newfolder/",members=bandsList)

script ( ). , , , , , .

, , , ? python tarfile , , , .

!

+4
2

, tar , , tar gzip, tar.gz. tar, , header->size , . , , , , , , , .

gzip . , , , . , _Stream gzip, gz .

+4

, tarfile . (https://docs.python.org/2/library/tarfile.html#tarfile.open)

mkdir tartest
cd tartest/
dd if=/dev/urandom of=file1 count=100 bs=1M
dd if=/dev/urandom of=file2 count=100 bs=1M
dd if=/dev/urandom of=file3 count=100 bs=1M
dd if=/dev/urandom of=file4 count=100 bs=1M
dd if=/dev/urandom of=file5 count=100 bs=1M
cd ..
tar czvf test.tgz tartest

:

import tarfile
fileName = "test.tgz"
tfile = tarfile.open(fileName, 'r|gz')
for t in tfile:
    if "file3" in t.name: 
        f = tfile.extractfile(t)
        if f:
            print(len(f.read()))

| open. file3.

$ time python test.py

104857600

real    0m1.201s
user    0m0.820s
sys     0m0.377s

r|gz r:gz, :

$ time python test.py 
104857600

real    0m7.033s
user    0m6.293s
sys     0m0.730s

5 ( 5 ). , ; tarfile ( , ). , , , , . , getnames. .

+5

All Articles