Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of JobPrioritization

Timestamp:: Aug 30, 2012, 12:32:20 PM (13 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

JobPrioritization

                       v1
+= Server scheduling improvements =
+By default, the BOINC scheduler dispatches jobs in the order returned by
+a database select, which is more or less FIFO.
+This is non-optimal in the following situations:
+ * If a job fails or times out, a '''retry job''' is created.
+  If there are lots of sendable jobs already in the DB,
+  it may be days or weeks before the retry job is dispatched.
+  During this period, completed replicas are uncredited and take up disk space.
+ * Jobs in the tail end of a batch should be done faster.
+To optimize these situations, there are two policies we can play with:
+ * The order in which the feeder enumerates jobs from the DB.
+ * Preferentially sending particular jobs to fast/reliable hosts.
+BOINC has mechanisms in the [BackendPrograms#feeder feeder]
+and [ProjectOptions#Acceleratingretries scheduler] that address these issues
+to some extent.
+However, these mechanisms are out of date.
+This is a proposal for revisions to these mechanisms.
+(to be completed)
+== Notes ==
+ * We should eliminate as much config as possible.
+   There should be no thresholds for turnaround time.
+   (especially a project-wide one; this should be per app).
+ * The notion of "reliable host" need not be binary.
+   Maybe we should do it in terms of order statistics -
+th percentile hosts, 90th percentile, etc.
+   Note: this is on a per (host, app version) basis.
+ * We need to think about how this interacts with HR.
+We need to think carefully about the dispatch model.
+In general we have some "special" jobs in cache
+and we get RPCs, some from "special" hosts.
+Two extreme policies:
+ * Send special jobs only to special hosts.
+  The danger: a special job may sit in the cache
+  for a long time, maybe forever.
+ * If we get a request from a non-special host,
+  and we can't satisfy it with non-special jobs,
+  send it special jobs too.
+  The danger: special jobs may be sent to a slow or unreliable host.
+Compromises are possible;
+e.g. we could associate a "min percentile" with each job in cache,
+and send a job only to (host, app version) of that percentile or greater.
+The min percentile could be decayed over time
+so that job would always eventually get sent.