Changes between Initial Version and Version 1 of JobPrioritization


Ignore:
Timestamp:
Aug 30, 2012, 12:32:20 PM (12 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • JobPrioritization

    v1 v1  
     1= Server scheduling improvements =
     2
     3By default, the BOINC scheduler dispatches jobs in the order returned by
     4a database select, which is more or less FIFO.
     5
     6This is non-optimal in the following situations:
     7
     8 * If a job fails or times out, a '''retry job''' is created.
     9  If there are lots of sendable jobs already in the DB,
     10  it may be days or weeks before the retry job is dispatched.
     11  During this period, completed replicas are uncredited and take up disk space.
     12 * Jobs in the tail end of a batch should be done faster.
     13
     14To optimize these situations, there are two policies we can play with:
     15
     16 * The order in which the feeder enumerates jobs from the DB.
     17 * Preferentially sending particular jobs to fast/reliable hosts.
     18
     19BOINC has mechanisms in the [BackendPrograms#feeder feeder]
     20and [ProjectOptions#Acceleratingretries scheduler] that address these issues
     21to some extent.
     22However, these mechanisms are out of date.
     23This is a proposal for revisions to these mechanisms.
     24
     25(to be completed)
     26
     27== Notes ==
     28
     29 * We should eliminate as much config as possible.
     30   There should be no thresholds for turnaround time.
     31   (especially a project-wide one; this should be per app).
     32 * The notion of "reliable host" need not be binary.
     33   Maybe we should do it in terms of order statistics -
     34   50th percentile hosts, 90th percentile, etc.
     35   Note: this is on a per (host, app version) basis.
     36 * We need to think about how this interacts with HR.
     37
     38We need to think carefully about the dispatch model.
     39In general we have some "special" jobs in cache
     40and we get RPCs, some from "special" hosts.
     41Two extreme policies:
     42
     43 * Send special jobs only to special hosts.
     44  The danger: a special job may sit in the cache
     45  for a long time, maybe forever.
     46 * If we get a request from a non-special host,
     47  and we can't satisfy it with non-special jobs,
     48  send it special jobs too.
     49  The danger: special jobs may be sent to a slow or unreliable host.
     50
     51Compromises are possible;
     52e.g. we could associate a "min percentile" with each job in cache,
     53and send a job only to (host, app version) of that percentile or greater.
     54The min percentile could be decayed over time
     55so that job would always eventually get sent.