Changes between Version 14 and Version 15 of JobIn


Ignore:
Timestamp:
Apr 1, 2014, 9:52:34 AM (11 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • JobIn

    v14 v15  
    55
    66 * A '''workunit''' describing the computation to be performed.
    7  * One or more '''results''', each of which describes an instance of a computation, either unstarted, in progress, or completed. The BOINC client software refers to results as "tasks".
     7 * One or more '''results''', each of which describes an instance of a computation, either unstarted, in progress, or completed.
     8   The BOINC client software refers to results as "tasks".
    89
    910These entities are stored in the 'workunit' and 'result' database tables respectively.
     
    1718
    1819 '''name'''::
    19         A text string, unique across all workunits in the project.  You can guarantee uniqueness by embedding the PID of the creating process, a sequence number, and a Unix timestamp in the name.
     20   A text string, unique across all workunits in the project.
     21   You can guarantee uniqueness by embedding the PID of the creating process,
     22   a sequence number, and a Unix timestamp in the name.
    2023 '''application'''::
    21         Which [AppVersion application] will perform the computation. A workunit is associated with an application, not with a particular version or range of versions. If the input data format changes in a way that is incompatible with older versions, you must either a) release new versions for all supported platforms, or b) create a new application. Such incompatibilities can often be avoided by using XML data format.
     24   Which [AppVersion application] will perform the computation.
     25   A workunit is associated with an application, not with a particular version or range of versions.
     26   If the input data format changes in a way that is incompatible with older versions,
     27   you must either a) release new versions for all supported platforms, or b) create a new application.
     28   Such incompatibilities can often be avoided by using XML data format.
    2229 '''input files'''::
    23         A list of the input files: their names, and the names by which the application refers to them. Typically these file are downloaded from a data server. However, if the `<generate_locally/>` element is present, the file is generated on the client (typically by an earlier instance of the same application). Applications should use file locking to prevent two jobs from generating the file at the same time.
     30   A list of the input files: their names, and the names by which the application refers to them.
     31   Typically these file are downloaded from a data server.
     32   However, if the `<generate_locally/>` element is present, the file is generated on the client
     33   (typically by an earlier instance of the same application).
     34   Applications should use file locking to prevent two jobs from generating the file at the same time.
    2435 '''batch'''::
    25         (optional) An integer; can be used to operate (cancel, change priority etc.) on groups of workunits.
     36   (optional) An integer; can be used to operate (cancel, change priority etc.) on groups of workunits.
    2637
    2738=== Resource estimates and bounds === #resources
     
    3142
    3243 '''rsc_fpops_est'''::
    33         An estimate of the number of floating-point operations required to complete the job, used to estimate how long the job will take on a given host.
     44   An estimate of the number of floating-point operations required to complete the job,
     45   used to estimate how long the job will take on a given host.
    3446 '''rsc_fpops_bound'''::
    35         An upper bound on the number of floating-point operations required to complete the job. If this bound is exceeded, the job will be aborted.
     47   An upper bound on the number of floating-point operations required to complete the job.
     48   If this bound is exceeded, the job will be aborted.
    3649 '''rsc_memory_bound'''::
    37         An estimate of job's largest working set size. The job will only be sent to hosts with at least this much available RAM.  If this bound is exceeded, the job will be aborted.
    38  '''rsc_disk_bound'''::
    39         A bound on the maximum disk space used by the job, including all input, temporary, and output files. The job will only be sent to hosts with at least this much available disk space. If this bound is exceeded, the job will be aborted.
    40  '''rsc_bandwidth_bound'''::
    41   If nonzero, this job will be sent only to hosts with at least this much download bandwidth.  Use for jobs with very large input files.
     50   An estimate of job's largest working set size.
     51   The job will only be sent to hosts with at least this much available RAM.
     52 '''rsc_disk_bound''':: A bound on the maximum disk space used by the job, including all input, temporary, and output files.
     53   The job will only be sent to hosts with at least this much available disk space. If this bound is exceeded, the job will be aborted.
     54 '''rsc_bandwidth_bound''':: If nonzero, this job will be sent only to hosts with at least this much download bandwidth.
     55   Use for jobs with very large input files.
    4256
    4357=== Redundancy and scheduling attributes === #scheduling
    4458
    4559 '''delay_bound'''::
    46         An upper bound on the time (in seconds) between sending a result to a client and receiving a reply. The scheduler won't issue a result if the estimated completion time exceeds this. If the client doesn't respond within this interval, the server 'gives up' on the result and generates a new result, to be assigned to another client. Set this to several times the average execution time of a workunit on a typical PC. If you set it too low, BOINC may not be able to send some results, and the corresponding workunit will be flagged with an error. If you set it too high, there may a corresponding delay in getting results back.
     60   An upper bound on the time (in seconds) between sending a result to a client and receiving a reply.
     61   The scheduler won't issue a result if the estimated completion time exceeds this.
     62   If the client doesn't respond within this interval, the server 'gives up' on the result and generates a new result,
     63   to be assigned to another client. Set this to several times the average execution time of a workunit on a typical PC.
     64   If you set it too low, BOINC may not be able to send some results, and the corresponding workunit will be flagged with an error.
     65   If you set it too high, there may a corresponding delay in getting results back.
    4766 '''min_quorum'''::
    48         The minimum size of a 'quorum'. The validator is run when there are this many successful results. If a strict majority agree, they are considered correct. Set this to two or more if you want redundant computing.
     67   The minimum size of a 'quorum'. The validator is run when there are this many successful results.
     68   If a strict majority agree, they are considered correct.
     69   Set this to two or more if you want redundant computing.
    4970 '''target_nresults'''::
    50         How many results to create initially. This must be at least '''min_quorum'''. It may be more, to reflect the ratio of result loss, or to get a quorum more quickly.
     71   How many results to create initially.
     72   This must be at least '''min_quorum'''.
     73   It may be more, to reflect the ratio of result loss, or to get a quorum more quickly.
    5174 '''max_error_results'''::
    52         If the number of client error results exceeds this, the work unit is declared to have an error; no further results are issued, and the assimilator is triggered. This safeguards against workunits that cause the application to crash.
     75   If the number of client error results exceeds this, the work unit is declared to have an error;
     76   no further results are issued, and the assimilator is triggered.
     77   This safeguards against workunits that cause the application to crash.
    5378 '''max_total_results'''::
    54         If the total number of results for this workunit would exceed this, the workunit is declared to be in error. This safeguards against workunits that are never reported (e.g. because they crash the core client).
     79   If the total number of results for this workunit would exceed this, the workunit is declared to be in error.
     80   This safeguards against workunits that are never reported (e.g. because they crash the core client).
    5581 '''max_success_results'''::
    56         If the number of success results for this workunit exceeds this, and a consensus has not been reached, the workunit is declared to be in error. This safeguards against workunits that produce nondeterministic results.
     82   If the number of success results for this workunit exceeds this, and a consensus has not been reached,
     83   the workunit is declared to be in error.
     84   This safeguards against workunits that produce nondeterministic results.
    5785 '''priority'''::
    58         (optional) Higher-priority work is dispatched first
     86   (optional) Higher-priority work is dispatched first
    5987
    6088A workunit can experience any of several error conditions:
    6189
    6290 '''WU_ERROR_COULDNT_SEND_RESULT'''::
    63         The BOINC scheduler was unable to send the workunit to a large number (~100) of hosts, probably because its resource requirements (disk, memory, CPU) were too large for the hosts, or because no application version was available for the hosts' platforms. In this case BOINC 'gives up' on the workunit.
     91   The BOINC scheduler was unable to send the workunit to a large number (~100) of hosts,
     92   probably because its resource requirements (disk, memory, CPU) were too large for the hosts,
     93   or because no application version was available for the hosts' platforms.
     94   In this case BOINC 'gives up' on the workunit.
    6495 '''WU_ERROR_TOO_MANY_ERROR_RESULTS'''::
    65         Too many results with error conditions (upload/download problem, client crashes) have been returned for this work unit.
     96   Too many results with error conditions (upload/download problem, client crashes) have been returned for this work unit.
    6697 '''WU_ERROR_TOO_MANY_SUCCESS_RESULTS'''::
    67         Too many successful results have been returned without consensus. This indicates that the application may be nondeterministic.
     98   Too many successful results have been returned without consensus. This indicates that the application may be nondeterministic.
    6899 '''WU_ERROR_TOO_MANY_TOTAL_RESULTS'''::
    69         Too many total results have been sent for this workunit.
     100   Too many total results have been sent for this workunit.
    70101
    71102If any of these conditions holds, BOINC doesn't dispatch more instances of the workunit.