Changes between Version 16 and Version 17 of FileCompression


Ignore:
Timestamp:
Jan 11, 2012, 8:10:08 AM (12 years ago)
Author:
romw
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FileCompression

    v16 v17  
    11[[PageOutline]]
     2
    23= File compression =
    3 
    44== BOINC-supplied compression ==
    5 
    65=== Compression of input files === #compress-input
    7 
    8 Starting with version 5.4,
    9 the BOINC client is able to handle HTTP `Content-Encoding` types
    10 'deflate' (zlib algorithm) and 'gzip' (gzip algorithm).
    11 The client decompresses these files 'on the fly'
    12 and stores them on disk in uncompressed form.
    13 This can be used in the following two ways.
    14 
    15 Both methods store files uncompressed on the client.
    16 If you need compression on the client,
    17 you must do it at the application level (see below).
     6Starting with version 5.4, the BOINC client is able to handle HTTP `Content-Encoding` types 'deflate' (zlib algorithm) and 'gzip' (gzip algorithm). The client decompresses these files 'on the fly' and stores them on disk in uncompressed form. This can be used in the following two ways.
     7
     8Both methods store files uncompressed on the client. If you need compression on the client, you must do it at the application level (see below).
    189
    1910==== gzip encoding ====
    20 
    21 To use this method, gzip your downloadable files,
    22 giving them a filename suffix such as '.gz'.
    23 (The name used in your `<file_info>` elements,
    24 however, is the original filename without '.gz').
    25 
    26 Include the following line in `httpd.conf`:
    27 {{{
    28 AddEncoding x-gzip .gz
    29 }}}
    30 and restart apache.
    31 
    32 This method has the advantage of reducing server disk usage and server CPU load,
    33 but it will only work with 5.4+ clients.
    34 BOINC clients older than 5.4 won't be able to download files.
    35 Use the 'min_core_client_version' entry in config.xml to enforce this.
     11To use this method, gzip your downloadable files, giving them a filename suffix such as '.gz'. (The name used in your `<file_info>` elements, however, is the original filename without '.gz').
     12
     13This method has the advantage of reducing server disk usage and server CPU load, but it will only work with 5.4+ clients. BOINC clients older than 5.4 won't be able to download files. Use the 'min_core_client_version' entry in config.xml to enforce this.
    3614
    3715==== Apache mod_deflate ====
    38 
    39 You can use the Apache 2.0 mod_deflate module to automatically compress files on the fly.
    40 See http://httpd.apache.org/docs/2.0/mod/mod_deflate.html.
    41 This method will work with all BOINC clients,
    42 but it will do compression only for 5.4+ clients.
    43 
    44 You can use this in conjunction with gzip encoding because the mod_deflate module
    45 allows you to exempt certain filetypes from on-the-fly compression.
    46 
    47 This method increases CPU load on the web server,
    48 but this is typically not significant.
     16You can use the Apache 2.0 mod_deflate module to automatically compress files on the fly. See http://httpd.apache.org/docs/2.0/mod/mod_deflate.html. This method will work with all BOINC clients, but it will do compression only for 5.4+ clients.
     17
     18You can use this in conjunction with gzip encoding because the mod_deflate module allows you to exempt certain filetypes from on-the-fly compression.
     19
     20This method increases CPU load on the web server, but this is typically not significant.
     21
     22==== Configuration File ====
    4923
    5024You'll need to modify your `httpd.conf` file; example:
     25
    5126{{{
    5227# Enable module
     
    6540DeflateCompressionLevel 2
    6641
     42# Add encoding type
     43AddEncoding x-gzip .gz
     44
    6745Alias /boinc/download /path/to/files/download
    6846
    6947<Directory /path/to/files/download>
     48Options Indexes FollowSymlinks MultiViews
     49AllowOverride AuthConfig
     50Order allow,deny
     51Allow from all
     52
     53RewriteEngine on
     54RewriteCond %{HTTP:Accept-Encoding} gzip.*deflate|deflate.*gzip
     55RewriteCond %{REQUEST_FILENAME} "\.(vmdk|exe|dll|pdb)$"
     56RewriteCond %{REQUEST_FILENAME}.gz -f
     57RewriteRule ^.*$ %{REQUEST_URI}.gz [L]
     58
     59<FilesMatch ".*\.(vmdk|exe|dll|pdb)\.gz$">
     60ForceType application/octet-stream
     61Header set Content-Encoding gzip
     62</FilesMatch>
     63
    7064SetOutputFilter DEFLATE
    7165SetEnvIfNoCase Request_URI \.(?:gz|gif|jpg|jpeg|png)$ no-gzip dont-vary
    7266</Directory>
    7367}}}
    74 
    75 This configuration tells Apache to compress all files served from
    76 the download direction except for files that end with `gz`,`gif`,`jpg`,`jpeg` and `png`.
     68This configuration tells Apache to redirect to the statically compressed files if the extensions is vmdk, exe, dll, and pdb. All other files are compressed on-the-fly from the download direction except for files that end with `gz`,`gif`,`jpg`,`jpeg` and `png`.
     69
    7770An alternate way to specify the files is the following:
    7871{{{
     
    8376</Directory>
    8477}}}
    85 This configuration tells Apache to compress only the file types
    86 `.faa` and `.mask` served from the download directory.
     78This configuration tells Apache to compress only the file types `.faa` and `.mask` served from the download directory.
    8779
    8880=== Compression of output files === #compress-output
    89 
    90 If you include the `<gzip_when_done>` tag in an [XmlFormat#Files output file description],
    91 the file will be gzip-compressed after it has been generated.
    92 
    93 The gzip_when_done is only supported in client version 5.8+.
    94 If you receive files from clients that do not support the gzip_when_done flag,
    95 then you should open the files with a function similar
    96 to this to your validator/assimilator:
     81If you include the `<gzip_when_done>` tag in an [wiki:XmlFormat#Files output file description], the file will be gzip-compressed after it has been generated.
     82
     83The gzip_when_done is only supported in client version 5.8+. If you receive files from clients that do not support the gzip_when_done flag, then you should open the files with a function similar to this to your validator/assimilator:
     84
    9785{{{
    9886#!cpp
     
    120108}
    121109}}}
    122 
    123 This will uncompress the file if it is compressed or will read it
    124 without modification if it is not compressed.
     110This will uncompress the file if it is compressed or will read it without modification if it is not compressed.
    125111
    126112== Application-level compression ==
    127 
    128113=== Using boinc_zip === #boinc-zip
    129 
    130 You can also do compression in your application.
    131 To assist this, BOINC provides a library
    132 boinc_zip, based on the [http://www.info-zip.org Info-Zip] libraries,
    133 but combines both zip & unzip
    134 functionality in one library.
    135 Any questions/comments please email Carl Christensen  (carlgt1 at yahoo dot com)
     114You can also do compression in your application. To assist this, BOINC provides a library boinc_zip, based on the [http://www.info-zip.org Info-Zip] libraries, but combines both zip & unzip functionality in one library. Any questions/comments please email Carl Christensen  (carlgt1 at yahoo dot com)
    136115
    137116This library can "co-exist" with zlib (libz) in case you need that too.
    138117
    139 Basically, it will allow you to build a library that you can link
    140 against to provide basic zip/unzip compression functionality.  It
    141 should only add a few hundred KB to your app (basically like
    142 distributing `zip` & `unzip` executable binaries for different platforms).
     118Basically, it will allow you to build a library that you can link  against to provide basic zip/unzip compression functionality.  It  should only add a few hundred KB to your app (basically like  distributing `zip` & `unzip` executable binaries for different platforms).
    143119
    144120==== Limitations ==== #boinc-zip-limitations
    145 The "unzip" functionality is there, that is you can unzip
    146 a file and it will create all directories & files in the zip file. 
    147 The "zip" functionality has some limitations due to the cross-platform
    148 nature:  mainly it doesn't provide zipping recursively (i.e.
    149 subdirectories); and wildcard handling is done using the "boinc_filelist"
    150 function which will be explained below.
     121The "unzip" functionality is there, that is you can unzip a file and it will create all directories & files in the zip file.   The "zip" functionality has some limitations due to the cross-platform nature:  mainly it doesn't provide zipping recursively (i.e.  subdirectories); and wildcard handling is done using the "boinc_filelist"  function which will be explained below.
    151122
    152123==== Building ==== #boinc-zip-building
    153 
    154 For Windows, you can just add the project "boinc_zip" to your
    155 Visual Studio "Solution" or "Workspace."  Basically just "Insert Existing
    156 Project" from the Visual Studio IDE, navigate over to the boinc/zip
    157 directory, and it should load the appropriate files.  You can then build
    158 "Debug" and "Release" versions of the library.  Then just add the
    159 appropriate reference to "boinc_zip.lib" (Release build) or "boinc_zipd.lib"
    160 (Debug build) in your app.
    161 
    162 For Linux & Mac, you should be able to run "./configure" and then do a "make"
    163 to build the "libboinc_zip.a" lib that you will link against.  In extreme
    164 cases, you may need to do an "aclocal && autoconf && automake" first,
    165 to build properly for your platform.
    166 
    167 Also, please note that boinc_zip relies on some BOINC functions that you will need
    168 (and will most likely be in your app already since they are handy) --
    169 namely `boinc/lib/filesys.C` and `boinc/lib/util.C`.
     124For Windows, you can just add the project "boinc_zip" to your  Visual Studio "Solution" or "Workspace."  Basically just "Insert Existing  Project" from the Visual Studio IDE, navigate over to the boinc/zip  directory, and it should load the appropriate files.  You can then build  "Debug" and "Release" versions of the library.  Then just add the  appropriate reference to "boinc_zip.lib" (Release build) or "boinc_zipd.lib" (Debug build) in your app.
     125
     126For Linux & Mac, you should be able to run "./configure" and then do a "make" to build the "libboinc_zip.a" lib that you will link against.  In extreme cases, you may need to do an "aclocal && autoconf && automake" first,  to build properly for your platform.
     127
     128Also, please note that boinc_zip relies on some BOINC functions that you will need (and will most likely be in your app already since they are handy) -- namely `boinc/lib/filesys.C` and `boinc/lib/util.C`.
    170129
    171130==== Using ==== #boinc-zip-using
    172 Basically, you will need to `#include "boinc_zip.h"` in your app (of course
    173 your compiler will need to know where it is, i.e. -I../boinc/zip).
    174 
    175 Then you can just call the function `boinc_zip` with the appropriate arguments
    176 to zip or unzip.  There are three overloaded boinc_zip's provided:
     131Basically, you will need to `#include "boinc_zip.h"` in your app (of course  your compiler will need to know where it is, i.e. -I../boinc/zip).
     132
     133Then you can just call the function `boinc_zip` with the appropriate arguments to zip or unzip.  There are three overloaded boinc_zip's provided:
     134
    177135{{{
    178136int boinc_zip(int bZipType, const std::string szFileZip,
     
    184142`bZipType` is `ZIP_IT` or `UNZIP_IT` (self-explanatory)
    185143
    186 `szFileZip` is the name of the zip file to create or extract
    187 (I assume the user will provide it with the .zip extension)
    188 
    189 The main differences are in the file parameter.  The zip library used was
    190 exhibiting odd behavior when "coexisting" with unzip, particularly in the
    191 wildcard handling.  So a function was made that creates a `ZipFileList` class,
    192 which is basically a vector of filenames.  If you are just compressing a
    193 single file, you can use either the `std::string` or `const char* szFileIn` overrides. 
     144`szFileZip` is the name of the zip file to create or extract (I assume the user will provide it with the .zip extension)
     145
     146The main differences are in the file parameter.  The zip library used was  exhibiting odd behavior when "coexisting" with unzip, particularly in the  wildcard handling.  So a function was made that creates a `ZipFileList` class,  which is basically a vector of filenames.  If you are just compressing a  single file, you can use either the `std::string` or `const char* szFileIn` overrides.
    194147
    195148You can also just pass in a `*` or a `*.*` to zip up all files in a directory.
    196149
    197 To zip multiple files in a "mix & match" fashion, you can use the `boinc_filelist`
    198 function provided.  Basically, it's a crude pattern matching of files in a
    199 directory, but it has been useful for us on the CPDN project.  Just create a
    200 `ZipFileList` instance, and then pass this into `boinc_filelist` as follows:
     150To zip multiple files in a "mix & match" fashion, you can use the `boinc_filelist` function provided.  Basically, it's a crude pattern matching of files in a directory, but it has been useful for us on the CPDN project.  Just create a  `ZipFileList` instance, and then pass this into `boinc_filelist` as follows:
     151
    201152{{{
    202153bool boinc_filelist(
     
    208159);
    209160}}}
    210 if you want to zip up all text (.txt) files in a directory, just pass in:
    211 the directory as a `std::string`, the pattern, i.e. ".txt", `&yourZipList`
    212 
    213 The last two flags are the sort order of the file list (CPDN files need to be
    214 in a certain order -- descending filenames, which is why that's the default).
    215 The default is to "clear" your list, you can set that to `false` to keep adding
    216 files to your `ZipFileList`.
    217 
    218 When you have created your `ZipFileList` just pass that pointer to `boinc_zip`.
    219 You will be able to add files in other directories this way.
    220 
    221 There is a `ziptest` Project for Windows provided to experiment, which can
    222 also be run (the "ziptest.cpp") on Unix & Mac to experiment
    223 with how `boinc_zip` work (just g++ with the `boinc/lib/filesys.C` & `util.C` as
    224 described above).
     161if you want to zip up all text (.txt) files in a directory, just pass in: the directory as a `std::string`, the pattern, i.e. ".txt", `&yourZipList`
     162
     163The last two flags are the sort order of the file list (CPDN files need to be in a certain order -- descending filenames, which is why that's the default). The default is to "clear" your list, you can set that to `false` to keep adding files to your `ZipFileList`.
     164
     165When you have created your `ZipFileList` just pass that pointer to `boinc_zip`. You will be able to add files in other directories this way.
     166
     167There is a `ziptest` Project for Windows provided to experiment, which can  also be run (the "ziptest.cpp") on Unix & Mac to experiment  with how `boinc_zip` work (just g++ with the `boinc/lib/filesys.C` & `util.C` as described above).
    225168
    226169==== Getting boinc_zip ==== #boinc-zip-getting
    227 
    228 boinc_zip is no longer in the main boinc subversion "trunk"
    229 but resides in this "depends" brance:
     170boinc_zip is no longer in the main boinc subversion "trunk" but resides in this "depends" brance:
    230171
    231172svn co http://boinc.berkeley.edu/svn/trunk/depends_projects/zip
    232173
    233 Note for Linux/Mac:  To build along with the other boinc libraries,
    234 you will need to add the following lines to the bottom of the '''configure.ac''' file
    235 (where the various Makefiles are listed):
     174Note for Linux/Mac:  To build along with the other boinc libraries, you will need to add the following lines to the bottom of the '''configure.ac''' file (where the various Makefiles are listed):
    236175
    237176{{{
     
    240179     zip/unzip/Makefile
    241180}}}
    242 
    243 
    244 Similarly for the '''Makefile.am''' file -- add zip, zip/zip and zip/unzip
    245 to the library subdirs:
     181Similarly for the '''Makefile.am''' file -- add zip, zip/zip and zip/unzip to the library subdirs:
    246182
    247183{{{
     
    250186endif
    251187}}}
    252 
    253 
    254188=== Using gzip (zlib) === #gzip
    255 
    256 These basic routines may be useful if you want to compress/decompress a file
    257 using the zlib library (usually called "libz.a" and available for most platforms).
    258 Include the header file below (qcn_gzip.h) in your program, and link against libz,
    259 and you will gain two simple to use functions for gzip'ing or gunzip'ing a file.
    260 This is for simple single file or file-by-file compression or decompression
    261 (i.e. one file that is to be compressed into a .gz or decompressed back to
    262 it's original uncompressed state).
    263 You can check for boinc client status if you want the ability to quit
    264 inside an operation etc.
     189These basic routines may be useful if you want to compress/decompress a file using the zlib library (usually called "libz.a" and available for most platforms). Include the header file below (qcn_gzip.h) in your program, and link against libz, and you will gain two simple to use functions for gzip'ing or gunzip'ing a file. This is for simple single file or file-by-file compression or decompression (i.e. one file that is to be compressed into a .gz or decompressed back to it's original uncompressed state). You can check for boinc client status if you want the ability to quit inside an operation etc.
    265190
    266191qcn_gzip.h:
     
    277202#endif
    278203}}}
    279 
    280 
    281204qcn_gzip.cpp:
    282205
     
    347270
    348271}}}
    349