List Archive

Thread

Thread Index

Message

From: "Jay Freeman (saurik)" <saurik%saurik.com@localhost>
To: libzip-discuss%nih.at@localhost
Subject: libzip does not use EOCD nentry field in PK-WARE compatible way
Date: Fri, 12 Aug 2016 20:54:49 +0000 (UTC)

Hello! I am wanting to use libzip on a zip file which contains more than 65k 
entries. The files in question are widely distributed and generated by Apple: 
they are OTA updates for the iPhone. There is no issue unzipping them using 
Apple's tooling, and in particular there is no issue working with them using 
PK-WARE's unzip utility. Here is a link to download an example file (note that 
this is 1.4GB large).

http://appldnld.apple.com/iOS7.1/031-5044.20140630.FhwIU/com_apple_MobileAsset_SoftwareUpdate/38e8a0a16a35012113e75081634542b51e8e61db.zip

The issue is that the field in the End of Central Directory record which stores 
the number of entries in the directory is clearly not able to store this value, 
and so it doesn't: it simply stores the value cut off. Instead of 69323, the 
field simply stores 3787. Now, you'd think this might make the zip file 
corrupt/broken, and in fact many libraries that implement the zip format do, 
but that is not true.

If you look at the code for PK-WARE's unzip, this file doesn't just work by 
accident: it works by deliberate support in the codebase; the number of entries 
in the central directory is considered consistent if it is equal, at this 
lowered precision, to the number of central directory entries that are actually 
found in the central directory (which is determined by simply walking through 
all of the headers).

unzpriv.h
   1966 #define MASK_ZUCN16             ((zucn_t)0xFFFF)

extract.c
    454                 if ((members_processed
    455                      & (G.ecrec.have_ecr64 ? MASK_ZUCN64 : MASK_ZUCN16))
    456                     == G.ecrec.total_entries_central_dir) {

Again, this isn't a check that is working "by accident" because it does a 
comparison at lowered precision: this is a check which deliberately masks off 
higher bits in the number of central directory entries that have been processed 
when the file was using a 16-bit version of this field, as from the perspective 
of PK-WARE it is apparently entirely acceptable that a zip file stores this 
field simply cut off.

Is it possible to have libzip support reading these files? I'm willing to do 
the work and provide a patch (and in fact by default will start working on this 
over the weekend), but I figure I should ask first/concurrently as maybe you 
have some particular way you'd want to see this done, or maybe you find the 
idea of doing it yourself important or exciting ;P. The way I'm currently 
intending to make this happen is to modify _zip_cdir_new to only set 
nentry_alloc (leaving nentry at 0, turning it into a vector), modify struct zip 
to embed a struct zip_cdir (removing some duplicate fields and enabling more 
code reuse), break zip_add_entry into two parts (leaving a zip_add_dirent that 
can be used separately), and then make _zip_read_cdir walk headers and check 
integrity in a manner compatible with PK-WARE. It sounds like a lot, but I 
actually think the patch would be shockingly minimal and I am confident this 
should have no measurable performance impact for files that don't have this 
field cut off.

Made by MHonArc.