Version 18 (modified by 13 years ago) (diff) | ,
---|
File compression
BOINC-supplied compression
Compression of input files
Starting with version 5.4, the BOINC client is able to handle HTTP Content-Encoding
types 'deflate' (zlib algorithm) and 'gzip' (gzip algorithm). The client decompresses these files 'on the fly' and stores them on disk in uncompressed form. This can be used in the following two ways.
Both methods store files uncompressed on the client. If you need compression on the client, you must do it at the application level (see below).
gzip encoding
To use this method, gzip your downloadable files, giving them a filename suffix such as '.gz'. (The name used in your <file_info>
elements, however, is the original filename without '.gz').
This method has the advantage of reducing server disk usage and server CPU load, but it will only work with 5.4+ clients. BOINC clients older than 5.4 won't be able to download files. Use the 'min_core_client_version' entry in config.xml to enforce this.
Apache mod_deflate
You can use the Apache 2.0 mod_deflate module to automatically compress files on the fly. See http://httpd.apache.org/docs/2.0/mod/mod_deflate.html. This method will work with all BOINC clients, but it will do compression only for 5.4+ clients.
You can use this in conjunction with gzip encoding because the mod_deflate module allows you to exempt certain filetypes from on-the-fly compression.
This method increases CPU load on the web server, but this is typically not significant.
Configuration File
You'll need to modify your httpd.conf
file; example:
# Enable module LoadModule deflate_module modules/mod_deflate.so # Log file compression DeflateFilterNote Input instream DeflateFilterNote Output outstream DeflateFilterNote Ratio ratio LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)' deflate CustomLog logs/deflate_log deflate # Use low settings for compression to make sure impact on server is low DeflateMemLevel 2 DeflateCompressionLevel 2 # Add encoding type AddEncoding x-gzip .gz Alias /boinc/download /path/to/files/download <Directory /path/to/files/download> Options Indexes FollowSymlinks MultiViews AllowOverride AuthConfig Order allow,deny Allow from all RewriteEngine on RewriteCond %{HTTP:Accept-Encoding} gzip.*deflate|deflate.*gzip RewriteCond %{REQUEST_FILENAME} "\.(vmdk|exe|dll|pdb)$" RewriteCond %{REQUEST_FILENAME}.gz -f RewriteRule ^.*$ %{REQUEST_URI}.gz [L] <FilesMatch ".*\.(vmdk|exe|dll|pdb)\.gz$"> ForceType application/octet-stream Header set Content-Encoding gzip </FilesMatch> SetOutputFilter DEFLATE SetEnvIfNoCase Request_URI \.(?:gz|gif|jpg|jpeg|png)$ no-gzip dont-vary </Directory>
This configuration tells Apache to redirect to the statically compressed files if the extension is vmdk, exe, dll, or pdb. All other files are compressed on-the-fly from the download direction except for files that end with gz
,gif
,jpg
,jpeg
and png
.
An alternate way to specify the files is the following:
Alias /boinc/download /path/to/files/download <Directory /path/to/files/download> AddOutputFilter DEFLATE .faa .mask </Directory>
This configuration tells Apache to compress only the file types .faa
and .mask
served from the download directory.
Compression of output files
If you include the <gzip_when_done>
tag in an output file description, the file will be gzip-compressed after it has been generated.
The gzip_when_done is only supported in client version 5.8+. If you receive files from clients that do not support the gzip_when_done flag, then you should open the files with a function similar to this to your validator/assimilator:
int read_gzip_file_string(string file, string* result) { FILE *infile; char cmd[512]; char buf[4096]; result->erase(); sprintf(cmd, "gzip -dcf %s", file.c_str()); infile = popen(cmd, "r"); if (infile == NULL) { return ERR_FOPEN; } while( fgets(buf, 4096, infile) != NULL ) { result->append(buf); } result->append("\0"); if (pclose(infile) != 0) { fprintf(stderr, "%s: pclose failed\n", file.c_str()); return 1; } return 0; }
This will uncompress the file if it is compressed or will read it without modification if it is not compressed.
Application-level compression
Using boinc_zip
You can also do compression in your application. To assist this, BOINC provides a library boinc_zip, based on the Info-Zip libraries, but combines both zip & unzip functionality in one library. Any questions/comments please email Carl Christensen (carlgt1 at yahoo dot com)
This library can "co-exist" with zlib (libz) in case you need that too.
Basically, it will allow you to build a library that you can link against to provide basic zip/unzip compression functionality. It should only add a few hundred KB to your app (basically like distributing zip
& unzip
executable binaries for different platforms).
Limitations
The "unzip" functionality is there, that is you can unzip a file and it will create all directories & files in the zip file. The "zip" functionality has some limitations due to the cross-platform nature: mainly it doesn't provide zipping recursively (i.e. subdirectories); and wildcard handling is done using the "boinc_filelist" function which will be explained below.
Building
For Windows, you can just add the project "boinc_zip" to your Visual Studio "Solution" or "Workspace." Basically just "Insert Existing Project" from the Visual Studio IDE, navigate over to the boinc/zip directory, and it should load the appropriate files. You can then build "Debug" and "Release" versions of the library. Then just add the appropriate reference to "boinc_zip.lib" (Release build) or "boinc_zipd.lib" (Debug build) in your app.
For Linux & Mac, you should be able to run "./configure" and then do a "make" to build the "libboinc_zip.a" lib that you will link against. In extreme cases, you may need to do an "aclocal && autoconf && automake" first, to build properly for your platform.
Also, please note that boinc_zip relies on some BOINC functions that you will need (and will most likely be in your app already since they are handy) -- namely boinc/lib/filesys.C
and boinc/lib/util.C
.
Using
Basically, you will need to #include "boinc_zip.h"
in your app (of course your compiler will need to know where it is, i.e. -I../boinc/zip).
Then you can just call the function boinc_zip
with the appropriate arguments to zip or unzip. There are three overloaded boinc_zip's provided:
int boinc_zip(int bZipType, const std::string szFileZip, const ZipFileList* pvectszFileIn); int boinc_zip(int bZipType, const std::string szFileZip, const std::string szFileIn); int boinc_zip(int bZipType, const char* szFileZip, const char* szFileIn);
bZipType
is ZIP_IT
or UNZIP_IT
(self-explanatory)
szFileZip
is the name of the zip file to create or extract (I assume the user will provide it with the .zip extension)
The main differences are in the file parameter. The zip library used was exhibiting odd behavior when "coexisting" with unzip, particularly in the wildcard handling. So a function was made that creates a ZipFileList
class, which is basically a vector of filenames. If you are just compressing a single file, you can use either the std::string
or const char* szFileIn
overrides.
You can also just pass in a *
or a *.*
to zip up all files in a directory.
To zip multiple files in a "mix & match" fashion, you can use the boinc_filelist
function provided. Basically, it's a crude pattern matching of files in a directory, but it has been useful for us on the CPDN project. Just create a ZipFileList
instance, and then pass this into boinc_filelist
as follows:
bool boinc_filelist( const std::string directory, const std::string pattern, ZipFileList* pList, const unsigned char ucSort = SORT_NAME | SORT_DESCENDING, const bool bClear = true );
if you want to zip up all text (.txt) files in a directory, just pass in: the directory as a std::string
, the pattern, i.e. ".txt", &yourZipList
The last two flags are the sort order of the file list (CPDN files need to be in a certain order -- descending filenames, which is why that's the default). The default is to "clear" your list, you can set that to false
to keep adding files to your ZipFileList
.
When you have created your ZipFileList
just pass that pointer to boinc_zip
. You will be able to add files in other directories this way.
There is a ziptest
Project for Windows provided to experiment, which can also be run (the "ziptest.cpp") on Unix & Mac to experiment with how boinc_zip
work (just g++ with the boinc/lib/filesys.C
& util.C
as described above).
Getting boinc_zip
boinc_zip is no longer in the main boinc subversion "trunk" but resides in this "depends" brance:
svn co http://boinc.berkeley.edu/svn/trunk/depends_projects/zip
Note for Linux/Mac?: To build along with the other boinc libraries, you will need to add the following lines to the bottom of the configure.ac file (where the various Makefiles are listed):
zip/Makefile zip/zip/Makefile zip/unzip/Makefile
Similarly for the Makefile.am file -- add zip, zip/zip and zip/unzip to the library subdirs:
if ENABLE_LIBRARIES API_SUBDIRS = api lib zip zip/zip zip/unzip endif
Using gzip (zlib)
These basic routines may be useful if you want to compress/decompress a file using the zlib library (usually called "libz.a" and available for most platforms). Include the header file below (qcn_gzip.h) in your program, and link against libz, and you will gain two simple to use functions for gzip'ing or gunzip'ing a file. This is for simple single file or file-by-file compression or decompression (i.e. one file that is to be compressed into a .gz or decompressed back to it's original uncompressed state). You can check for boinc client status if you want the ability to quit inside an operation etc.
qcn_gzip.h:
#ifndef _QCN_GZIP_H_ #define _QCN_GZIP_H_ #include <zlib.h> int do_gzip(const char* strGZ, const char* strInput); int do_gunzip(const char* strGZ, const char* strInput, bool bKeep = false); #endif
qcn_gzip.cpp:
#include <stdio.h> #include "filesys.h" #include "qcn_gzip.h" int do_gzip(const char* strGZ, const char* strInput) { // take an input file (strInput) and turn it into a compressed file (strGZ) // get rid of the input file after FILE* fIn = boinc_fopen(strInput, "rb"); if (!fIn) return 1; //error gzFile fOut = gzopen(strGZ, "wb"); if (!fOut) return 1; //error fseek(fIn, 0, SEEK_SET); // go to the top of the files gzseek(fOut, 0, SEEK_SET); unsigned char buf[1024]; long lRead = 0, lWrite = 0; while (!feof(fIn)) { // read 1KB at a time until end of file memset(buf, 0x00, 1024); lRead = 0; lRead = (long) fread(buf, 1, 1024, fIn); lWrite = (long) gzwrite(fOut, buf, lRead); if (lRead != lWrite) break; } gzclose(fOut); fclose(fIn); if (lRead != lWrite) return 1; //error -- read bytes != written bytes // if we made it here, it compressed OK, can erase strInput and leave boinc_delete_file(strInput); return 0; } // CMC - commented out status calls, are they too paranoid? // if needed use sm->statusBOINC instead (for quit_request etc) int do_gunzip(const char* strGZ, const char* strInput, bool bKeep) { // take an input file (strInput) and turn it into a compressed file (strGZ) // get rid of the input file after //s.quit_request = 0; //checkBOINCStatus(); FILE* fIn = boinc_fopen(strInput, "wb"); if (!fIn) return 1; //error gzFile fOut = gzopen(strGZ, "rb"); if (!fOut) return 1; //error fseek(fIn, 0, SEEK_SET); // go to the top of the files gzseek(fOut, 0, SEEK_SET); unsigned char buf[1024]; long lRead = 0, lWrite = 0; while (!gzeof(fOut)) { // read 1KB at a time until end of file memset(buf, 0x00, 1024); lRead = 0; lRead = (long) gzread(fOut,buf,1024); lWrite = (long) fwrite(buf, 1, 1024, fIn); if (lRead != lWrite) break; //boinc_get_status(&s); //if (s.quit_request || s.abort_request || s.no_heartbeat) break; } gzclose(fOut); fclose(fIn); //checkBOINCStatus(); if (lRead != lWrite) return 1; //error -- read bytes != written bytes // if we made it here, it compressed OK, can erase strInput and leave if (!bKeep) boinc_delete_file(strGZ); return 0; }