Version 1 (modified by 12 years ago) (diff) | ,
---|
Server scheduling improvements
By default, the BOINC scheduler dispatches jobs in the order returned by a database select, which is more or less FIFO.
This is non-optimal in the following situations:
- If a job fails or times out, a retry job is created. If there are lots of sendable jobs already in the DB, it may be days or weeks before the retry job is dispatched. During this period, completed replicas are uncredited and take up disk space.
- Jobs in the tail end of a batch should be done faster.
To optimize these situations, there are two policies we can play with:
- The order in which the feeder enumerates jobs from the DB.
- Preferentially sending particular jobs to fast/reliable hosts.
BOINC has mechanisms in the feeder and scheduler that address these issues to some extent. However, these mechanisms are out of date. This is a proposal for revisions to these mechanisms.
(to be completed)
Notes
- We should eliminate as much config as possible. There should be no thresholds for turnaround time. (especially a project-wide one; this should be per app).
- The notion of "reliable host" need not be binary. Maybe we should do it in terms of order statistics - 50th percentile hosts, 90th percentile, etc. Note: this is on a per (host, app version) basis.
- We need to think about how this interacts with HR.
We need to think carefully about the dispatch model. In general we have some "special" jobs in cache and we get RPCs, some from "special" hosts. Two extreme policies:
- Send special jobs only to special hosts. The danger: a special job may sit in the cache for a long time, maybe forever.
- If we get a request from a non-special host, and we can't satisfy it with non-special jobs, send it special jobs too. The danger: special jobs may be sent to a slow or unreliable host.
Compromises are possible; e.g. we could associate a "min percentile" with each job in cache, and send a job only to (host, app version) of that percentile or greater. The min percentile could be decayed over time so that job would always eventually get sent.