Changes between Version 1 and Version 2 of DataFlow


Ignore:
Timestamp:
Jun 9, 2008, 4:23:02 PM (16 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataFlow

    v1 v2  
    1         2) data flow options
    2                 jobs, job instances
    3                 work generator, assimilator
    4                 simple case: no files
    5                 single input, output file, no sticky
    6                 sticky input files
    7                 sticky output files
    8                 locality scheduling
    9                 querying/deleting files
    10                 long-running jobs
    11                         trickle messages
    12                         intermediate file upload
     1= Jobs and data =
     2
     3BOINC is designed for high throughput:
     4millions of volunteer hosts, millions of jobs per day.
     5To maximize your project's performance it's important to
     6understand the life-cycle of a job:
     7
     8 * The job and its associated input files are generated (typically by a '''work generator''') program.
     9 * One or more instances of the job are created.
     10 * The instances are dispatched to different hosts.
     11 * Each host downloads the input files.
     12 * After some queueing delay due to other jobs in progress, it executes the job, then uploads its output files.
     13 * It reports the completed job, possible after an additional delay (whose purpose is to reduce the rate of scheduler requests).
     14 * A '''validator''' program checks the output files, perhaps comparing replicas.
     15 * When a valid instance is found, an '''assimilator''' program handles the results (e.g., by inserting them in a separate database).
     16 * When all instances have been completed, a '''file deleter''' deletes the input and output files.
     17 * A '''DB purge''' program deletes the database entries for the job and job instances.
     18
     19== Input and output files ==
     20
     21Each job can have arbitrarily many input and output files.
     22Each file has various attributes, e.g.:
     23
     24 * '''Sticky''': the file should remain on the client even when no job is using it.
     25 * '''Upload when present''': upload the file when it's complete (the default is to not upload it).
     26
     27Jobs don't need to have any input or output files;
     28the input can come from command-line arguments (which are stored in the database record for the job)
     29and the output can be written to stderr (which is returned to the server and stored in the DB).
     30
     31Suppose you have many jobs that use the same input file.
     32In that case mark is as sticky;
     33clients will then download it just once.
     34
     35Suppose your application generates a file which can
     36potentially be used by subsequent jobs.
     37In that case make it a sticky output file,
     38without upload-when-present.
     39
     40== Locality scheduling ==
     41
     42Suppose you have an application that has large input files,
     43and many jobs use the same input file.
     44In that case, once a file has been downloaded to a client,
     45you'd like to issue it jobs that use that file if possible.
     46To support this, BOINC offers a mechanism called
     47[LocalityScheduling locality scheduling].