= Database purging utility = As a BOINC project operates, the size of its workunit and result tables increases. Eventually they become so large that adding a field or building an index may take hours or days. To address this problem, BOINC provides a utility '''db_purge''' that writes result and WU records to XML-format archive files, then deletes them from the database. Workunits are purged only when their input files have been deleted. Because of BOINC's file-deletion policy, this implies that all results are completed. So when a workunit is purged, all its results are purged too. db_purge creates an archive/ directory and store archive files there. db_purge is normally run as a daemon, specified in the [ProjectConfigFile config.xml] file. It has the following command-line options: '''-min_age_days N ''':: Purge only WUs with mod_time at least N days in the past. Recommended value: 7 or so. This lets users examine their recent results. '''-max N''':: Purge at most N WUs, then exit '''-max_wu_per_file N''':: Write at most N WUs to each archive file. Recommended value: 10,000 or so. '''-zip''':: Compress archive files using zip '''-gzip''':: Compress archive files using gzip '''-d N''':: Set logging verbosity to N (1,2,3) '''-no_archive''':: Don't archive workunits or results == Recording credited jobs == You may want to keep a record of which user returned a valid result for each workunit (e.g. so that you can give them credit for specific discoveries). If you use db_purge, this information is removed from the database; it can be extracted from the XML archives, but it's slow. Alternatively, you can store the user/workunit association in a separate table called '''credited_job'''. This records in this table are small - they store only a user ID and a 64-bit integer which is taken from the workunit's 'opaque' field (typically an index in a separate database). By default, the table has indices that allow you to efficiently enumerate workunits for a given user and users for a given workunit. To use the credited_job table, pass the -update_credited_job flag to your validator. == Archive file format == The archive files have names of the form `wu_archive_TIME` and `result_archive_TIME` where `TIME` is the Unix time the file was created. In addition, db_purge generates index files 'wu_index' and 'result_index' associating each WU and result ID with the timestamp of its archive file. The format of both type of index files is a number of rows each containing: {{{ ID TIME }}} The ID field of the WU or result, 5 spaces, and the timestamp part of the archive filename where the record with that ID can be found. The format of a record in the result archive file is: {{{ %d %d %d %d %d %d %d %d %d %d %d %s %.15e %s %s %s %d %d %d %.15e %.15e %f %d %d %d %d %d %d %s }}} The format of a record in the WU archive file is: {{{ %d %d %d %s %s %d %.15e %.15e %.15e %.15e %d %d %.15e %d %d %d %d %d %d %f %d %d %d %d %d %s %d %s }}}