Context Navigation

Changes between Version 1 and Version 2 of DataFlow

Timestamp:: Jun 9, 2008, 4:23:02 PM (17 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

DataFlow

-                      v1
+                      v2
+) data flow options
+                jobs, job instances
+                work generator, assimilator
+                simple case: no files
+                single input, output file, no sticky
+                sticky input files
+                sticky output files
+                locality scheduling
+                querying/deleting files
+                long-running jobs
+                        trickle messages
+                        intermediate file upload
+= Jobs and data =
+BOINC is designed for high throughput:
+millions of volunteer hosts, millions of jobs per day.
+To maximize your project's performance it's important to
+understand the life-cycle of a job:
+ * The job and its associated input files are generated (typically by a '''work generator''') program.
+ * One or more instances of the job are created.
+ * The instances are dispatched to different hosts.
+ * Each host downloads the input files.
+ * After some queueing delay due to other jobs in progress, it executes the job, then uploads its output files.
+ * It reports the completed job, possible after an additional delay (whose purpose is to reduce the rate of scheduler requests).
+ * A '''validator''' program checks the output files, perhaps comparing replicas.
+ * When a valid instance is found, an '''assimilator''' program handles the results (e.g., by inserting them in a separate database).
+ * When all instances have been completed, a '''file deleter''' deletes the input and output files.
+ * A '''DB purge''' program deletes the database entries for the job and job instances.
+== Input and output files ==
+Each job can have arbitrarily many input and output files.
+Each file has various attributes, e.g.:
+ * '''Sticky''': the file should remain on the client even when no job is using it.
+ * '''Upload when present''': upload the file when it's complete (the default is to not upload it).
+Jobs don't need to have any input or output files;
+the input can come from command-line arguments (which are stored in the database record for the job)
+and the output can be written to stderr (which is returned to the server and stored in the DB).
+Suppose you have many jobs that use the same input file.
+In that case mark is as sticky;
+clients will then download it just once.
+Suppose your application generates a file which can
+potentially be used by subsequent jobs.
+In that case make it a sticky output file,
+without upload-when-present.
+== Locality scheduling ==
+Suppose you have an application that has large input files,
+and many jobs use the same input file.
+In that case, once a file has been downloaded to a client,
+you'd like to issue it jobs that use that file if possible.
+To support this, BOINC offers a mechanism called
+[LocalityScheduling locality scheduling].