| 1 | | 2) data flow options |
| 2 | | jobs, job instances |
| 3 | | work generator, assimilator |
| 4 | | simple case: no files |
| 5 | | single input, output file, no sticky |
| 6 | | sticky input files |
| 7 | | sticky output files |
| 8 | | locality scheduling |
| 9 | | querying/deleting files |
| 10 | | long-running jobs |
| 11 | | trickle messages |
| 12 | | intermediate file upload |
| | 1 | = Jobs and data = |
| | 2 | |
| | 3 | BOINC is designed for high throughput: |
| | 4 | millions of volunteer hosts, millions of jobs per day. |
| | 5 | To maximize your project's performance it's important to |
| | 6 | understand the life-cycle of a job: |
| | 7 | |
| | 8 | * The job and its associated input files are generated (typically by a '''work generator''') program. |
| | 9 | * One or more instances of the job are created. |
| | 10 | * The instances are dispatched to different hosts. |
| | 11 | * Each host downloads the input files. |
| | 12 | * After some queueing delay due to other jobs in progress, it executes the job, then uploads its output files. |
| | 13 | * It reports the completed job, possible after an additional delay (whose purpose is to reduce the rate of scheduler requests). |
| | 14 | * A '''validator''' program checks the output files, perhaps comparing replicas. |
| | 15 | * When a valid instance is found, an '''assimilator''' program handles the results (e.g., by inserting them in a separate database). |
| | 16 | * When all instances have been completed, a '''file deleter''' deletes the input and output files. |
| | 17 | * A '''DB purge''' program deletes the database entries for the job and job instances. |
| | 18 | |
| | 19 | == Input and output files == |
| | 20 | |
| | 21 | Each job can have arbitrarily many input and output files. |
| | 22 | Each file has various attributes, e.g.: |
| | 23 | |
| | 24 | * '''Sticky''': the file should remain on the client even when no job is using it. |
| | 25 | * '''Upload when present''': upload the file when it's complete (the default is to not upload it). |
| | 26 | |
| | 27 | Jobs don't need to have any input or output files; |
| | 28 | the input can come from command-line arguments (which are stored in the database record for the job) |
| | 29 | and the output can be written to stderr (which is returned to the server and stored in the DB). |
| | 30 | |
| | 31 | Suppose you have many jobs that use the same input file. |
| | 32 | In that case mark is as sticky; |
| | 33 | clients will then download it just once. |
| | 34 | |
| | 35 | Suppose your application generates a file which can |
| | 36 | potentially be used by subsequent jobs. |
| | 37 | In that case make it a sticky output file, |
| | 38 | without upload-when-present. |
| | 39 | |
| | 40 | == Locality scheduling == |
| | 41 | |
| | 42 | Suppose you have an application that has large input files, |
| | 43 | and many jobs use the same input file. |
| | 44 | In that case, once a file has been downloaded to a client, |
| | 45 | you'd like to issue it jobs that use that file if possible. |
| | 46 | To support this, BOINC offers a mechanism called |
| | 47 | [LocalityScheduling locality scheduling]. |