= Database purging utility =
As a BOINC project operates, the size of its workunit and result tables increases.
Eventually they become so large that adding a field or building an index may take hours or days.
To keep this from happening, BOINC provides a utility '''db_purge'''
that writes result and WU records to XML-format archive files,
then deletes them from the database.
Workunits are purged only when their input files have been deleted.
Because of BOINC's file-deletion policy, this implies that all results are completed.
So when a workunit is purged, all its results are purged too.
db_purge creates an archive/ directory and stores archive files there.
db_purge is normally run as a daemon, specified in the [ProjectConfigFile config.xml] file.
It has the following command-line options:
'''-min_age_days N '''::
Purge only WUs with mod_time at least N days in the past. Recommended value: 7 or so. This lets users examine their recent results.
'''-max N'''::
Purge at most N WUs, then exit
'''-max_wu_per_file N'''::
Write at most N WUs to each archive file. Recommended value: 10,000 or so.
'''-zip'''::
Compress archive files using zip
'''-gzip'''::
Compress archive files using gzip
'''-d N'''::
Set logging verbosity to N (1,2,3)
'''-no_archive'''::
Don't archive workunits or results
== Recording credited jobs ==
You may want to keep a record of which user returned a valid result for each workunit
(e.g. so that you can give them credit for specific discoveries).
If you use db_purge, this information is removed from the database;
it can be extracted from the XML archives, but it's slow.
Alternatively, you can store the user/workunit association in
a separate table called '''credited_job'''.
The records in this table are small - they store only a user ID
and a 64-bit integer which is taken from the workunit's 'opaque' field
(typically an index in a separate database).
By default, the table has indices that allow you to efficiently
enumerate workunits for a given user and users for a given workunit.
To use the credited_job table, pass the -update_credited_job flag
to your validator.
== Archive file format ==
The archive files have names of the form `wu_archive_TIME` and `result_archive_TIME` where `TIME` is the Unix time the file was created. In addition, db_purge generates index files 'wu_index' and 'result_index' associating each WU and result ID with the timestamp of its archive file.
The format of both type of index files is a number of rows each containing:
{{{
ID TIME
}}}
The ID field of the WU or result, 5 spaces, and the timestamp part of the archive filename where the record with that ID can be found.
The format of a record in the result archive file is:
{{{
%d
%d
%d
%d
%d
%d
%d
%d
%d
%d
%d
%s
%.15e
%s
%s
%s
%d
%d
%d
%.15e
%.15e
%f
%d
%d
%d
%d
%d
%d
%s
}}}
The format of a record in the WU archive file is:
{{{
%d
%d
%d
%s
%s
%d
%.15e
%.15e
%.15e
%.15e
%d
%d
%.15e
%d
%d
%d
%d
%d
%d
%f
%d
%d
%d
%d
%d
%s
%d
%s
}}}