List Archive

Thread

Thread Index

Message

From: Thomas Klausner <tk%giga.or.at@localhost>
To: Frederik Ramm <frederik%remote.org@localhost>
Subject: Re: How to find compressed size of file after adding
Date: Thu, 5 Sep 2019 23:55:00 +0200

On Thu, Sep 05, 2019 at 07:49:26PM +0200, Frederik Ramm wrote:
> I have a program that writes out a zip file with a very large number of
> files, and potentially several 100 GB big.
> 
> For users who cannot easily handle files that big, I would like to be
> able to split my output into several files of the same size. (I know I
> can do this after, with zipsplit, but would love to be able to do it
> while producing the file.)
> 
> I have therefore modified my program to call zip_stat_index with the
> index I got back from zip_add immediately after zip_add, and keep track
> of the sum of comp_size values, starting a new output file whenever that
> reaches a set limit.
> 
> However, even though the PNG files I am adding do compress with ~10% on
> average, the comp_size I get back from zip_stat_index is always
> identical to the size, i.e. it does not correctly report the compressed
> size. (Used 1.1 before and thought maybe it's a bug that has been fixed
> in 1.5 but 1.5 shows the same behaviour.)
> 
> I guess my assumption that the compressed file size is available
> immediately after zip_add is wrong.

If you take compressed data from another zip archive, the compressed
file size should be correct after zip_add; if the data has to be
compressed, this will only happen the archive is closed.

> Is there another way to achieve what I want? I am not after precision
> here; I don't need to know exactly when my limit is reached, an
> approximation would be fine. I thought of maybe closing the zip file
> every 1000 files and re-opening it so that I could check the size on
> disk, but that is certainly a very dumb idea in terms of performance...?

Since you already know your expected compression, why not handle it on
a higher level? You could always go with the worst case assumption of
zero percent compression, and split on uncompressed size.

Or let whatever your user interface is pass an estimated compression
factor.

There is no support in libzip for what you want to achieve, and I
don't see an easy way to add it either. Sorry.
 Thomas

Made by MHonArc.