How to find the position of the central directory in a zip file?

Question

How to find the position of the central directory in a zip file?

I am trying to find the position of the first header of a Central Directory file in a Zip file.

I read these: http://en.wikipedia.org/wiki/Zip_(file_format ) http://www.pkware.com/documents/casestudies/APPNOTE.TXT

As I see it, I can only view Zip data, identify by the title which section I am in, and then do this until I hit the Central Directory title. I would obviously read the file headers before and use the "compressed size" to skip the actual data, and not to loop through every byte in the file ...

If I do this, then I practically already know all the files and folders inside the Zip file, and in this case I no longer use the Central Directory.

As far as I understand, the purpose of Central Directory is to list file metadata and the position of the actual data in the Zip file, so you won’t need to scan the entire file?

After reading the End Of Central Directory entry, Wikipedia says:

This ordering allows you to create a zip file in one pass, but it is usually unpacked the first time you read the central directory at the end.

How can I easily find the End of Central Directory? We need to remember that it may have a comment of arbitrary size, so I may not know how many bytes are at the end of the data stream in which it is located. Am I just scanning it?

PS I am writing a Zip file reader.

+8

format zip

Tower Dec 21 '11 at 17:33

source share

3 answers

Start from the end and start scanning in the direction of the beginning, look for the end of the directory signature and count the number of bytes you checked. When you find a candidate, get a byte offset of 20 for the comment length (L). Check if L + 20 corresponds to the current account. Then verify that the beginning of the central directory (indicated by the offset of byte 12) is appropriately signed.

If you assumed that the bits were quite random when the verification of the signature turned out to be a wild guess (for example, the guess of landing in a data segment), the probability of getting the correct signature bits is rather low. You can clarify this and find out the probability of landing in the data segment and the possibility of getting into the legal heading (as a function of the number of such headings), but this already sounds like a low probability for me. You can increase your level of trust and then verify the signature of the first file entry, but be sure to handle the boundary case with an empty zip file.

+8

Derek e Jan 9 '13 at 15:53

source share

Just cross your fingers and hope that in CRC, timestamp or datestamp there is no 06054B50 or any other four byte sequence that will be 06054B50.

+1

user2624417 Jan 11 '14 at 17:57

source share

Tower · Accepted Answer · 2011-12-22T18:23:37+0000

I ended the loop through bytes, starting at the end. The loop stops if it finds a matching sequence of bytes, the index is below zero, or if it has already passed through 64k bytes.

How to find the position of the central directory in a zip file?

More articles: