Changes between Version 2 and Version 3 of RemoteInputFiles


Ignore:
Timestamp:
Feb 4, 2013, 11:53:11 PM (11 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • RemoteInputFiles

    v2 v3  
    11= Remote management of input files =
     2
     3For a file to be used as an input file of a BOINC job,
     4it must be available to BOINC clients via HTTP.
     5The standard way to do this is put the file
     6in the project's "download directory" on the project server.
     7
     8For projects that use [RemoteJobs remote job submission],
     9job submitters don't have login access to the server,
     10so they can't store files there directly.
     11Instead, BOINC provides two mechanisms that allow
     12job submitters to place files on the BOINC server.
     13
     14Each of these mechanisms deals with two issues:
     15
     16 * '''File immutability''': BOINC requires that a file
     17  of a given name can never be changed.
     18  Job submitters can't be expected to obey this rule:
     19  they must be able to submit one job with an input file
     20  of a given name, and a second job with an input file of
     21  the same name but different contents.
     22 * File cleanup: There must be some way to clean up
     23  files on the server when they are no longer needed.
    224
    325== Content-based file management ==
    426
     27This system is used by the [CondorBoinc Condor/BOINC interface].
     28If may be useful for other systems as well.
     29In this system, the name of a file on the BOINC server
     30is based on its MD5 hash; thus file immutability is automatic.
     31
     32File cleanup is based on file/batch associations.
     33Each file can be associated with one or more batches.
     34Files that are no longer associated with an active batch are
     35automatically deleted from the server.
     36
     37The system uses two Web RPCs.
     38These are implemented as XML sent via HTTP POST;
     39the RPC handler is html/user/job_files.php.
     40
     41The following C++ interfaces are provided
     42(in samples/condor/job_rpc.cpp).
     43This is to be called on the job submission host;
     44the files must exist on that host,
     45and their MD5s must have already been computed.
    546{{{
    647extern int query_files(
     
    1051    vector<string> &md5s,
    1152    vector<string> &paths,
    12     vector<int> &absent_files
     53    vector<int> &absent_files           // output
    1354);
     55}}}
    1456
     57Inputs:
     58 * '''project_url''': the project's master URL
     59 * '''authenticator''': the job submitter's authenticator
     60 * '''paths''': a list of file paths on the calling host.
     61 * '''md5s''': a list of the MD5s of the files.
     62 * '''batch_id''': the ID of a batch whose jobs will reference the files
     63  (these jobs need not exist yet).
     64
     65Action: for each file, see if it exists on the server.
     66If it does, create an association to the given batch.
     67
     68Output:
     69 * return value: nonzero on error
     70 * '''absent_files''': a list of files not present on the server
     71  (represented as indices into the file vector).
     72
     73{{{
    1574extern int upload_files (
    1675    const char* project_url,
    1776    const char* authenticator,
    18     int batch_id,
     77    vector<string> &paths,
    1978    vector<string> &md5s,
    20     vector<string> &paths
     79    int batch_id
    2180);
    2281}}}
    2382
     83Inputs:
     84 * '''project_url, authenticator, batch_id''': as above.
     85 * '''paths''': a list of paths of files to be uploaded
     86 * '''md5s''': a list of MD5 hashes of these files
     87 * '''batch_id''': the ID of a batch with which the files are associated
     88
     89Action: Upload the files, and create associations to the given batch.
     90
     91Output:
     92 * return value: nonzero on error
     93
     94If you use this system, periodically run the script
     95'''html/ops/delete_job_files'''.
     96This will delete files that are no longer associated
     97with an active batch.
     98
    2499== Per-user file sandbox ==
     100