Version 12 (modified by 8 years ago) (diff) | ,
---|
Job-based input file management
Input files of BOINC jobs must be available on a public web server. For projects that use remote job submission, job submitters don't have login access to the BOINC server, so they can't store files there directly.
There are several options:
- Files are served from a publicly-accessible server, possibly other than the BOINC server. They must be managed, and file immutability enforced, by a mechanism outside BOINC.
- Per-user file sandbox: job submitters explicitly maintain, via a web interface, a set of files on the server.
- Job-based file management: files are automatically transferred from the submission machine to the BOINC server via Web RPCs.
This document describes the latter mechanism.
In this system, you must supply physical names of files that are globally unique. The easiest way to do this is to include a hash of the file contents in the name.
File cleanup is based on file/batch associations. You must create a batch (with create_batch()) before querying or uploading files. Each file can be associated with one or more batches. Files that are no longer associated with an active batch are automatically deleted from the server.
The system uses two Web RPCs. These are implemented as XML sent via HTTP POST; the RPC handler is html/user/job_files.php.
C++ interface
The following C++ functions are provided (in lib/remote_submit.cpp). They are to be called on the job submission host; the files must exist on that host.
extern int query_files( const char* project_url, const char* authenticator, std::vector<string> &boinc_names, // must be unique, e.g. by including content hash int batch_id, std::vector<int> &absent_files, // output std::string& error_message );
Inputs:
- project_url: the project's master URL
- authenticator: the job submitter's authenticator.
- boinc_names: a duplicate-free list of the BOINC's physical names of the files. These typically will include a hash (e.g. MD5) of the file contents.
- batch_id: the ID of a batch whose jobs will reference the files (these jobs need not exist yet). The operation will fail if the user is not authorized to submit jobs to the batch's application.
Action: for each file, see if it exists on the server. If it does, create an association to the given batch.
Output:
- return value: nonzero on error
- absent_files: a list of files not present on the server (represented as indices into the boinc_names vector).
- error_message: if error, an explanatory string.
extern int upload_files ( const char* project_url, const char* authenticator, std::vector<string> &paths, std::vector<string> &boinc_names, int batch_id, std::string& error_message );
Inputs:
- project_url, authenticator, batch_id: as above.
- paths: a list of paths of files to be uploaded
- boinc_names: a list of BOINC names of these files (see above).
- batch_id: the ID of a batch with which the files are associated. The operation will fail if the user is not authorized to submit jobs to the batch's application.
Action: Upload the files, and create associations to the given batch.
Output:
- return value: nonzero on error
- error_message: if error, an explanatory string.
If you use this system, periodically run the script html/ops/delete_job_files. This will delete files that are no longer associated with an active batch.
Python interface
The Python interface does both RPCs in one function (in lib/submit_api.py):
import submit_api req = UPLOAD_FILES_REQ() req.project = project_url req.authenticator = get_auth() req.batch_id = 271 req.local_names = ('updater.cpp', 'kill_wu.cpp') req.boinc_names = ('xxx_updater.cpp', 'xxx_kill_wu.cpp') r = upload_files(req) if r[0].tag == 'error': print 'error: ', r[0].find('error_msg').text return print 'success'
File size limits
Note: This mechanism upload files via a PHP script. PHP's default max file upload size is 2MB. To increase this, edit /etc/php.ini, and change, e.g.
upload_max_filesize = 64M post_max_size = 64M