1 | | 2) data flow options |
2 | | jobs, job instances |
3 | | work generator, assimilator |
4 | | simple case: no files |
5 | | single input, output file, no sticky |
6 | | sticky input files |
7 | | sticky output files |
8 | | locality scheduling |
9 | | querying/deleting files |
10 | | long-running jobs |
11 | | trickle messages |
12 | | intermediate file upload |
| 1 | = Jobs and data = |
| 2 | |
| 3 | BOINC is designed for high throughput: |
| 4 | millions of volunteer hosts, millions of jobs per day. |
| 5 | To maximize your project's performance it's important to |
| 6 | understand the life-cycle of a job: |
| 7 | |
| 8 | * The job and its associated input files are generated (typically by a '''work generator''') program. |
| 9 | * One or more instances of the job are created. |
| 10 | * The instances are dispatched to different hosts. |
| 11 | * Each host downloads the input files. |
| 12 | * After some queueing delay due to other jobs in progress, it executes the job, then uploads its output files. |
| 13 | * It reports the completed job, possible after an additional delay (whose purpose is to reduce the rate of scheduler requests). |
| 14 | * A '''validator''' program checks the output files, perhaps comparing replicas. |
| 15 | * When a valid instance is found, an '''assimilator''' program handles the results (e.g., by inserting them in a separate database). |
| 16 | * When all instances have been completed, a '''file deleter''' deletes the input and output files. |
| 17 | * A '''DB purge''' program deletes the database entries for the job and job instances. |
| 18 | |
| 19 | == Input and output files == |
| 20 | |
| 21 | Each job can have arbitrarily many input and output files. |
| 22 | Each file has various attributes, e.g.: |
| 23 | |
| 24 | * '''Sticky''': the file should remain on the client even when no job is using it. |
| 25 | * '''Upload when present''': upload the file when it's complete (the default is to not upload it). |
| 26 | |
| 27 | Jobs don't need to have any input or output files; |
| 28 | the input can come from command-line arguments (which are stored in the database record for the job) |
| 29 | and the output can be written to stderr (which is returned to the server and stored in the DB). |
| 30 | |
| 31 | Suppose you have many jobs that use the same input file. |
| 32 | In that case mark is as sticky; |
| 33 | clients will then download it just once. |
| 34 | |
| 35 | Suppose your application generates a file which can |
| 36 | potentially be used by subsequent jobs. |
| 37 | In that case make it a sticky output file, |
| 38 | without upload-when-present. |
| 39 | |
| 40 | == Locality scheduling == |
| 41 | |
| 42 | Suppose you have an application that has large input files, |
| 43 | and many jobs use the same input file. |
| 44 | In that case, once a file has been downloaded to a client, |
| 45 | you'd like to issue it jobs that use that file if possible. |
| 46 | To support this, BOINC offers a mechanism called |
| 47 | [LocalityScheduling locality scheduling]. |