3 | | == Compression of output files == #compress-output |
4 | | |
5 | | If you include the `<gzip_when_done>` tag in an [XmlFormat#Files output file description], the file will be gzip-compressed after it has been generated. |
6 | | |
7 | | The gzip_when_done is only supported in client version 5.8 (version # needs to be confirmed) and more recently. If you will receive files from clients that do not support the gzip_when_done flag then you should open the files with a function similar to this to your validator/assimilator: |
| 3 | |
| 4 | == BOINC-supplied compression == |
| 5 | |
| 6 | === Compression of input files === #compress-input |
| 7 | |
| 8 | Starting with version 5.4, |
| 9 | the BOINC client is able to handle HTTP `Content-Encoding` types |
| 10 | 'deflate' (zlib algorithm) and 'gzip' (gzip algorithm). |
| 11 | The client decompresses these files 'on the fly' |
| 12 | and stores them on disk in uncompressed form. |
| 13 | This can be used in the following two ways. |
| 14 | |
| 15 | Both methods store files uncompressed on the client. |
| 16 | If you need compression on the client, |
| 17 | you must do it at the application level (see below). |
| 18 | |
| 19 | ==== gzip encoding ==== |
| 20 | |
| 21 | To use this method, gzip your downloadable files, |
| 22 | giving them a filename suffix such as '.gz'. |
| 23 | (The name used in your `<file_info>` elements, |
| 24 | however, is the original filename without '.gz'). |
| 25 | |
| 26 | Include the following line in `httpd.conf`: |
| 27 | {{{ |
| 28 | AddEncoding x-gzip .gz |
| 29 | }}} |
| 30 | and restart apache. |
| 31 | |
| 32 | This method has the advantage of reducing server disk usage and server CPU load, |
| 33 | but it will only work with 5.4+ clients. |
| 34 | BOINC clients older than 5.4 won't be able to download files. |
| 35 | Use the 'min_core_client_version' entry in config.xml to enforce this. |
| 36 | |
| 37 | ==== Apache mod_deflate ==== |
| 38 | |
| 39 | You can use the Apache 2.0 mod_deflate module to automatically compress files on the fly. |
| 40 | See http://httpd.apache.org/docs/2.0/mod/mod_deflate.html. |
| 41 | This method will work with all BOINC clients, |
| 42 | but it will do compression only for 5.4+ clients. |
| 43 | |
| 44 | You can use this in conjunction with gzip encoding because the mod_deflate module |
| 45 | allows you to exempt certain filetypes from on-the-fly compression. |
| 46 | |
| 47 | This method increases CPU load on the web server, |
| 48 | but this is typically not significant. |
| 49 | |
| 50 | You'll need to modify your `httpd.conf` file; example: |
| 51 | {{{ |
| 52 | # Enable module |
| 53 | LoadModule deflate_module modules/mod_deflate.so |
| 54 | |
| 55 | # Log file compression |
| 56 | DeflateFilterNote Input instream |
| 57 | DeflateFilterNote Output outstream |
| 58 | DeflateFilterNote Ratio ratio |
| 59 | |
| 60 | LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)' deflate |
| 61 | CustomLog logs/deflate_log deflate |
| 62 | |
| 63 | # Use low settings for compression to make sure impact on server is low |
| 64 | DeflateMemLevel 2 |
| 65 | DeflateCompressionLevel 2 |
| 66 | |
| 67 | Alias /boinc/download /path/to/files/download |
| 68 | |
| 69 | <Directory /path/to/files/download> |
| 70 | SetOutputFilter DEFLATE |
| 71 | SetEnvIfNoCase Request_URI \.(?:gz|gif|jpg|jpeg|png)$ no-gzip dont-vary |
| 72 | </Directory> |
| 73 | }}} |
| 74 | |
| 75 | This configuration tells Apache to compress all files served from |
| 76 | the download direction except for files that end with `gz`,`gif`,`jpg`,`jpeg` and `png`. |
| 77 | An alternate way to specify the files is the following: |
| 78 | {{{ |
| 79 | Alias /boinc/download /path/to/files/download |
| 80 | |
| 81 | <Directory /path/to/files/download> |
| 82 | AddOutputFilter DEFLATE .faa .mask |
| 83 | </Directory> |
| 84 | }}} |
| 85 | This configuration tells Apache to compress only the file types |
| 86 | `.faa` and `.mask` served from the download directory. |
| 87 | |
| 88 | === Compression of output files === #compress-output |
| 89 | |
| 90 | If you include the `<gzip_when_done>` tag in an [XmlFormat#Files output file description], |
| 91 | the file will be gzip-compressed after it has been generated. |
| 92 | |
| 93 | The gzip_when_done is only supported in client version 5.8+. |
| 94 | If you receive files from clients that do not support the gzip_when_done flag, |
| 95 | then you should open the files with a function similar |
| 96 | to this to your validator/assimilator: |
38 | | This will automatically uncompress the file if it is compressed or it will open it without modification if it is not compressed. |
39 | | |
40 | | == Compression of input files == #compress-input |
41 | | |
42 | | Starting with version 5.4, the BOINC client is able to handle HTTP `Content-Encoding` types 'deflate' (zlib algorithm) and 'gzip' (gzip algorithm). The client decompresses these files 'on the fly' and stores them on disk in uncompressed form. |
43 | | |
44 | | You can use this in two ways: |
45 | | |
46 | | * Use the Apache 2.0 mod_deflate module to automatically compress files on the fly. This method will work with all BOINC clients, but it will do compression only for 5.4+ clients. See [#mod_deflate Using mod_deflate]. |
47 | | * Compress files and give them a filename suffix such as '.gz'. The name used in your `<file_info>` elements, however, is the original filename without '.gz'. BOINC clients older than 5.4 won't be able to download files. |
48 | | |
49 | | Include the following line in `httpd.conf`: |
50 | | |
51 | | {{{ |
52 | | AddEncoding x-gzip .gz |
53 | | }}} |
54 | | |
55 | | and restart apache. |
56 | | This will add the content encoding to the header so that the client will decompress the file automatically. |
57 | | This method has the advantage of reducing server disk usage and server CPU load, |
58 | | but it will only work with 5.4+ clients. |
59 | | Use the 'min_core_version' field of the app_version table to enforce this. |
60 | | You can use this in conjunction because the mod_deflate module |
61 | | allows you to exempt certain filetypes from on-the-fly compression. |
62 | | |
63 | | Both methods store files uncompressed on the client. If you need compression on the client, you must do it at the application level. The BOINC source distribution includes a version of the zip library designed for use by BOINC applications on any platform (see below). |
64 | | |
65 | | |
66 | | === Using mod_deflate === #mod_deflate |
67 | | |
68 | | Apache 2.0 includes a module called mod_deflate. |
69 | | You can read about it here: |
70 | | http://httpd.apache.org/docs/2.0/mod/mod_deflate.html |
71 | | |
72 | | This module allows you to specify that certain files will be |
73 | | compressed dynamically when it is being sent to clients that specify |
74 | | that they can handle it. |
75 | | The BOINC client 5.4 and higher includes the ability to |
76 | | decompress compressed files as they are downloaded. |
77 | | If a BOINC client 5.2 or earlier requests work, |
78 | | then the server will simply not compress the file so that |
79 | | the client can handle the file. |
80 | | We were expecting to only compress a few key files due to |
81 | | the expected load on the server. |
82 | | However, it turns out that the load on the server |
83 | | is actually quite small so we are compressing most of the files |
84 | | downloaded from our servers. |
85 | | Adding the compression on the fly only added about 5% |
86 | | to the system CPU utilization (obviously it will vary |
87 | | based on the power of your servers). |
88 | | |
89 | | You need to read the Apache 2.0 documentation about this |
90 | | module to make sure you understand it. |
91 | | However, our `httpd.conf` file for these changes includes the following: |
92 | | {{{ |
93 | | # Enable module |
94 | | LoadModule deflate_module modules/mod_deflate.so |
95 | | |
96 | | # Log file compression |
97 | | DeflateFilterNote Input instream |
98 | | DeflateFilterNote Output outstream |
99 | | DeflateFilterNote Ratio ratio |
100 | | |
101 | | LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)' deflate |
102 | | CustomLog logs/deflate_log deflate |
103 | | |
104 | | # Use low settings for compression to make sure impact on server is low |
105 | | DeflateMemLevel 2 |
106 | | DeflateCompressionLevel 2 |
107 | | |
108 | | Alias /boinc/download /path/to/files/download |
109 | | |
110 | | <Directory /path/to/files/download> |
111 | | SetOutputFilter DEFLATE |
112 | | SetEnvIfNoCase Request_URI \.(?:gz|gif|jpg|jpeg|png)$ no-gzip dont-vary |
113 | | </Directory> |
114 | | }}} |
115 | | |
116 | | This configuration tells Apache to compress all files served from |
117 | | the download direction except for files that end with `gz`,`gif`,`jpg`,`jpeg` and `png`. |
118 | | An alternate way to specify the files is the following: |
119 | | {{{ |
120 | | Alias /boinc/download /path/to/files/download |
121 | | |
122 | | <Directory /path/to/files/download> |
123 | | AddOutputFilter DEFLATE .faa .mask |
124 | | </Directory> |
125 | | }}} |
126 | | This configuration tells Apache to compress only the file types |
127 | | `.faa` and `.mask` served from the download directory. |
128 | | |
129 | | == Using boinc_zip == #boinc-zip |
| 123 | This will uncompress the file if it is compressed or will read it |
| 124 | without modification if it is not compressed. |
| 125 | |
| 126 | == Application-level compression == |
| 127 | |
| 128 | === Using boinc_zip === #boinc-zip |
246 | | == Client and Server Compression and Decompression using gzip (zlib) == #gzip |
247 | | |
248 | | These basic routines may be useful if you want to compress/decompress a file using the zlib library (usually called "libz.a" and available for most platforms). Include the header file below (qcn_gzip.h) in your program, and link against libz, and you will gain two simple to use functions for gzip'ing or gunzip'ing a file. This is for simple single file or file-by-file compression or decompression (i.e. one file that is to be compressed into a .gz or decompressed back to it's original uncompressed state). You can check for boinc client status if you want the ability to quit inside an operation etc. |
| 254 | === Using gzip (zlib) === #gzip |
| 255 | |
| 256 | These basic routines may be useful if you want to compress/decompress a file |
| 257 | using the zlib library (usually called "libz.a" and available for most platforms). |
| 258 | Include the header file below (qcn_gzip.h) in your program, and link against libz, |
| 259 | and you will gain two simple to use functions for gzip'ing or gunzip'ing a file. |
| 260 | This is for simple single file or file-by-file compression or decompression |
| 261 | (i.e. one file that is to be compressed into a .gz or decompressed back to |
| 262 | it's original uncompressed state). |
| 263 | You can check for boinc client status if you want the ability to quit |
| 264 | inside an operation etc. |
272 | | int do_gzip(const char* strGZ, const char* strInput) |
273 | | { |
274 | | // take an input file (strInput) and turn it into a compressed file (strGZ) |
275 | | // get rid of the input file after |
276 | | FILE* fIn = boinc_fopen(strInput, "rb"); |
277 | | if (!fIn) return 1; //error |
278 | | gzFile fOut = gzopen(strGZ, "wb"); |
279 | | if (!fOut) return 1; //error |
280 | | fseek(fIn, 0, SEEK_SET); // go to the top of the files |
281 | | gzseek(fOut, 0, SEEK_SET); |
282 | | unsigned char buf[1024]; |
283 | | long lRead = 0, lWrite = 0; |
284 | | while (!feof(fIn)) { // read 1KB at a time until end of file |
285 | | memset(buf, 0x00, 1024); |
286 | | lRead = 0; |
287 | | lRead = (long) fread(buf, 1, 1024, fIn); |
288 | | lWrite = (long) gzwrite(fOut, buf, lRead); |
289 | | if (lRead != lWrite) break; |
290 | | } |
291 | | gzclose(fOut); |
292 | | fclose(fIn); |
293 | | if (lRead != lWrite) return 1; //error -- read bytes != written bytes |
294 | | // if we made it here, it compressed OK, can erase strInput and leave |
295 | | boinc_delete_file(strInput); |
296 | | return 0; |
| 288 | int do_gzip(const char* strGZ, const char* strInput) { |
| 289 | // take an input file (strInput) and turn it into a compressed file (strGZ) |
| 290 | // get rid of the input file after |
| 291 | FILE* fIn = boinc_fopen(strInput, "rb"); |
| 292 | if (!fIn) return 1; //error |
| 293 | gzFile fOut = gzopen(strGZ, "wb"); |
| 294 | if (!fOut) return 1; //error |
| 295 | fseek(fIn, 0, SEEK_SET); // go to the top of the files |
| 296 | gzseek(fOut, 0, SEEK_SET); |
| 297 | unsigned char buf[1024]; |
| 298 | long lRead = 0, lWrite = 0; |
| 299 | while (!feof(fIn)) { // read 1KB at a time until end of file |
| 300 | memset(buf, 0x00, 1024); |
| 301 | lRead = 0; |
| 302 | lRead = (long) fread(buf, 1, 1024, fIn); |
| 303 | lWrite = (long) gzwrite(fOut, buf, lRead); |
| 304 | if (lRead != lWrite) break; |
| 305 | } |
| 306 | gzclose(fOut); |
| 307 | fclose(fIn); |
| 308 | if (lRead != lWrite) return 1; //error -- read bytes != written bytes |
| 309 | // if we made it here, it compressed OK, can erase strInput and leave |
| 310 | boinc_delete_file(strInput); |
| 311 | return 0; |
300 | | // if needed use sm->statusBOINC instead (for quit_request etc) |
301 | | |
302 | | int do_gunzip(const char* strGZ, const char* strInput, bool bKeep) |
303 | | { |
304 | | // take an input file (strInput) and turn it into a compressed file (strGZ) |
305 | | // get rid of the input file after |
306 | | //s.quit_request = 0; |
307 | | //checkBOINCStatus(); |
308 | | FILE* fIn = boinc_fopen(strInput, "wb"); |
309 | | if (!fIn) return 1; //error |
310 | | gzFile fOut = gzopen(strGZ, "rb"); |
311 | | if (!fOut) return 1; //error |
312 | | fseek(fIn, 0, SEEK_SET); // go to the top of the files |
313 | | gzseek(fOut, 0, SEEK_SET); |
314 | | unsigned char buf[1024]; |
315 | | long lRead = 0, lWrite = 0; |
316 | | while (!gzeof(fOut)) { // read 1KB at a time until end of file |
317 | | memset(buf, 0x00, 1024); |
318 | | lRead = 0; |
319 | | lRead = (long) gzread(fOut,buf,1024); |
320 | | lWrite = (long) fwrite(buf, 1, 1024, fIn); |
321 | | if (lRead != lWrite) break; |
322 | | //boinc_get_status(&s); |
323 | | //if (s.quit_request || s.abort_request || s.no_heartbeat) break; |
324 | | } |
325 | | gzclose(fOut); |
326 | | fclose(fIn); |
327 | | //checkBOINCStatus(); |
328 | | if (lRead != lWrite) return 1; //error -- read bytes != written bytes |
329 | | // if we made it here, it compressed OK, can erase strInput and leave |
330 | | if (!bKeep) boinc_delete_file(strGZ); |
331 | | return 0; |
| 315 | // if needed use sm->statusBOINC instead (for quit_request etc) |
| 316 | |
| 317 | int do_gunzip(const char* strGZ, const char* strInput, bool bKeep) { |
| 318 | // take an input file (strInput) and turn it into a compressed file (strGZ) |
| 319 | // get rid of the input file after |
| 320 | //s.quit_request = 0; |
| 321 | //checkBOINCStatus(); |
| 322 | FILE* fIn = boinc_fopen(strInput, "wb"); |
| 323 | if (!fIn) return 1; //error |
| 324 | gzFile fOut = gzopen(strGZ, "rb"); |
| 325 | if (!fOut) return 1; //error |
| 326 | fseek(fIn, 0, SEEK_SET); // go to the top of the files |
| 327 | gzseek(fOut, 0, SEEK_SET); |
| 328 | unsigned char buf[1024]; |
| 329 | long lRead = 0, lWrite = 0; |
| 330 | while (!gzeof(fOut)) { // read 1KB at a time until end of file |
| 331 | memset(buf, 0x00, 1024); |
| 332 | lRead = 0; |
| 333 | lRead = (long) gzread(fOut,buf,1024); |
| 334 | lWrite = (long) fwrite(buf, 1, 1024, fIn); |
| 335 | if (lRead != lWrite) break; |
| 336 | //boinc_get_status(&s); |
| 337 | //if (s.quit_request || s.abort_request || s.no_heartbeat) break; |
| 338 | } |
| 339 | gzclose(fOut); |
| 340 | fclose(fIn); |
| 341 | //checkBOINCStatus(); |
| 342 | if (lRead != lWrite) return 1; //error -- read bytes != written bytes |
| 343 | // if we made it here, it compressed OK, can erase strInput and leave |
| 344 | if (!bKeep) boinc_delete_file(strGZ); |
| 345 | return 0; |