wiki:RemoteJobs

Version 38 (modified by davea, 11 years ago) (diff)

--

Web RPCs for remote job submission

This document describes APIs for remotely submitting, monitoring, and controlling jobs on a BOINC server. The APIs supports the submission of batches of jobs, which may contain a single job or many thousands of jobs. Currently, the API has two restrictions:

  • All jobs in a batch must use the same application.
  • There can be no dependencies between jobs.

At the bottom level, the interface uses XML over HTTP. At a higher level, BOINC provides client-side bindings in PHP and C++; these are somewhat different.

As jobs are completed, their output files are available to the submitting user via HTTP. When a batch is complete, a zipped archive of all its output files is available via HTTP. Details are here.

PHP interface

The following functions are provided in the PHP file submit.inc, which is independent of other BOINC PHP code.

boinc_submit_batch()

Submits a batch.

Arguments: a "request object" whose fields include

  • project: the project URL
  • authenticator: the user's authenticator
  • app_name: the name of the application for which jobs are being submitted
  • batch_name: a symbolic name for the batch. Need not be unique.
  • jobs: an array of job descriptors, each of which contains
    • rsc_fpops_est: an estimate of the FLOPs used by the job
    • command_line: command-line arguments to the application
    • input_files: an array of input file descriptors, each of which contains
      • mode: "local", "semilocal", "local_staged", or "inline" (see below).
      • source: meaning depends on mode:
        • local: path on the BOINC server
        • semilocal: the file's URL
        • local_staged: physical name
        • inline: the file's contents

Result: a 2-element array containing

  • The batch ID
  • An error message (null if success)

Input files can be supplied in any of the following ways:

  • local: the file is on the BOINC server and is not staged. It's specified by its full path.
  • local_staged: the filed has been staged on the BOINC server. It's specified by its physical name.
  • semilocal: the file is on a data server that's accessible to the BOINC server but not necessarily to the outside world. The file is specified by its URL. It will be downloaded by the BOINC server during job submission, and served to clients from the BOINC server.
  • inline: the file is included in the job submission request XML message. It will be served to clients from BOINC server.

The following modes have been proposed but are not implemented yet:

  • remote: the file is on a data server other than the BOINC server, and will be served to clients from that data server. It's specified by the URL, the file size, and the file MD5.
  • sandbox: the file is in the user's sandbox, and is specified by its name in the sandbox.

The following example submits a 10-job batch:

$req->project = "http://foo.bar.edu/test/";
$req->authenticator = "xxx";
$req->app_name = "uppercase";
$req->jobs = array();

$f->mode = "local_staged";
$f->source = "filename.dat";
$job->input_files = array($f);

for ($i=10; $i<20; $i++) {
    $job->rsc_fpops_est = $i*1e9;
    $job->command_line = "--t $i";
    $req->jobs[] = $job;
}

list($batch_id, $errmsg) = boinc_submit_batch($req);
if ($errmsg) {
    echo "Error: $errmsg\n";
} else {
    echo "Batch ID: $batch_id\n";
}

boinc_estimate_batch()

Returns an estimate of the elapsed time required to complete a batch.

Arguments: same as boinc_submit_batch() (only relevant fields need to be populated).

Return value: a 2-element array containing

  • The elapsed time estimate, in seconds
  • An error message (null if success)

boinc_query_batches()

Returns a list of this user's batches, both in progress and complete.

Argument: a request object with elements

  • project and authenticator: as above.

Result: a 2-element array. The first element is an array of batch descriptor objects, each with the following fields:

  • id: batch ID
  • state: values are
    • 1: in progress
    • 2: completed (all jobs either succeeded or had fatal errors)
    • 3: aborted
    • 4: retired
  • name: the batch name
  • app_name: the application name
  • create_time: when the batch was submitted
  • est_completion_time: current estimate of completion time
  • njobs: the number of jobs in the batch
  • fraction_done: the fraction of the batch that has been completed (0..1)
  • nerror_jobs: the number of jobs that had fatal errors
  • completion_time: when the batch was completed
  • credit_estimate: BOINC's initial estimate of the credit that would be granted to complete the batch, including replication
  • credit_canonical: the actual credit granted to canonical instances
  • credit_total: the actual credit granted to all instances

boinc_query_batch()

Gets batch details.

Argument: a request object with elements

  • project and authenticator: as above
  • batch_id: specifies a batch.

Result: a 2-element array. The first element is a batch descriptor object as described above, with one additional field:

  • jobs: an array of job descriptor objects, each one containing
    • id: the database ID of the job's workunit record
    • canonical_instance_id: if the job has a canonical result, its database ID

boinc_query_job()

Gets job details.

Argument: a request object with elements:

  • project and authenticator: as above
  • job_id: specifies a job.

Result: a 2-element array. The first element is a job descriptor object with the following fields:

  • instances: an array of job instance descriptors, each containing:
    • name: the instance's name
    • id: the ID of the corresponding result record
    • state: a string describing the instance's state (unsent, in progress, complete, etc.)
    • outfile: if the instance is over, a list of output file descriptors, each containing
      • size: file size in bytes

boinc_abort_batch()

Argument: a request object with elements

  • project and authenticator: as above,
  • batch_id: specifies a batch.

Result: an error message, null if successful

boinc_retire_batch()

Delete server storage (files, DB records) associated with a batch.

Argument: a request object with elements

  • project and authenticator: as above,
  • batch_id: specifies a batch.

Result: an error message, null if successful

C++ interface

A C++ interface to the following functions is available in lib/remote_submit.cpp. Include lib/remote_submit.h.

All functions return zero on success, else an error code as defined in lib/error_numbers.h

create_batch()

Create a batch - a set of jobs, initially empty.

int create_batch(
    const char* project_url,
    const char* authenticator,
    const char* batch_name,
    const char* app_name,
    double expire_time,
    int &batch_id,
    string& error_msg
);
project_url
the project URL
authenticator
the authenticator of the submitting user
batch_name
a name for the batch. Must be unique over all batches.
app_name
the name of an application on the BOINC server
expire_time
if nonzero, the Unix time when the batch should be aborted and removed from the server, whether or not it's completed.
batch_id
(out) the batch's database ID
error_msg
(out) an error message if the operation failed

submit_jobs()

Submit a set of jobs; place them in an existing batch, and make them runnable.

int submit_jobs(
    const char* project_url,
    const char* authenticator,
    char app_name[256],
    int batch_id,
    vector<JOB> jobs,
    string& error_msg
);

struct JOB {
    char job_name[256];
    string cmdline_args;
    vector<INFILE> infiles;
};

struct INFILE {
    char physical_name[256]; 
};
batch_id
ID of a previously created batch

For each job:

job_name
must be unique over all jobs
cmdline_args
command-line arguments
infiles
list of input files

For each input file:

physical_name
BOINC's physical name for the file. The file must already be staged.

query_batches()

Query the status of a set of batches.

extern int query_batches(
    const char* project_url,
    const char* authenticator,
    vector<string> &batch_names,
    QUERY_BATCH_REPLY& reply,
    string& error_msg
);

struct QUERY_BATCH_JOB {
    string job_name;
    string status;		// DONE, ERROR, or IN_PROGRESS
    QUERY_BATCH_JOB(){}
};

struct QUERY_BATCH_REPLY {
    vector<int> batch_sizes;    // how many jobs in each of the queried batches
    vector<QUERY_BATCH_JOB> jobs;   // the jobs, sequentially
};

abort_jobs()

Abort a set of jobs.

extern int abort_jobs(
    const char* project_url,
    const char* authenticator,
    vector<string> &job_names,
    string& error_msg
);

query_completed_job()

Query a completed job.

extern int query_completed_job(
    const char* project_url,
    const char* authenticator,
    const char* job_name,
    COMPLETED_JOB_DESC&,
    string& error_msg
);

struct COMPLETED_JOB_DESC {
    int canonical_resultid;
    int error_mask;
    int error_resultid;
    int exit_status;
    double elapsed_time;
    double cpu_time;
    string stderr_out;
};
canonical_resultid
database ID of the "canonical" instance of the job.
error_mask
a bitmask of error conditions (see db/boinc_db_types.h)
error_resultid
the database ID of a failed instance, if one exists
exit_status
exit status of failed instance
elapsed_time
run time of canonical instance
cpu_time
CPU time of canonical instance
stderr_out
stderr output of canonical or failed instance

retire_batch()

"Retire" a batch. The server is then allowed to delete the batch's input and output files, and its database records.

extern int retire_batch(
    const char* project_url,
    const char* authenticator,
    const char* batch_name,
    string& error_msg
);

set_expire_time()

Change the expiration time of a batch.

extern int set_expire_time(
    const char* project_url,
    const char* authenticator,
    const char* batch_name,
    double expire_time,
    string& error_msg
);

ping_server()

Ping the project's server; return zero if the server is up.

extern int ping_server(
    const char* project_url,
    string& error_msg
);

HTTP/XML interface

At a lower level, the APIs are accessed by sending a POST request, using HTTP or HTTPS, to PROJECT_URL/submit_rpc_handler.php. The inputs and outputs of each function are XML documents. The format of the request and reply XML documents can be inferred from inc/submit.inc and user/submit_rpc_handler.php.

Example web interface

An example of a web interface for job submission and control, based on this API, can be found here: http://boinc.berkeley.edu/trac/browser/trunk/boinc/html/user/submit_example.php

This example is functional and it shows how to use the API. However, you will have to modify it heavily for your particular applications and web site.