Version 7 (modified by 13 years ago) (diff) | ,
---|
Client scheduling policies
The client embodies two related scheduling policies:
- Task scheduling policy
- Of the tasks that are runnable, which ones to execute?
- Work-fetch policy
- When to ask a project for more work, which project to ask, and how much work of each processor type to ask for?
Note: a processor type is either CPU or a GPU vendor. There may be multiple instances of each processor type.
The goals of these policies are (in descending priority):
- Tasks should be completed and reported by their deadline (results reported after their deadline may not have any value to the project and may not be granted credit).
- All processors should be kept busy.
- At any given point, the computer should have enough work so that its processors will be busy for at least min_buf days and not much more than max_buf days.
- Project resource shares should be honored over the long term.
- If a computer is attached to multiple projects, execution should rotate among projects on a frequent basis to improve the volunteer experience.
Task scheduling simulation
The baseline task scheduling policy is weighted round-robin (WRR): one-hour time slices are given to projects in turn, with the number of time slices in proportion to the project's resource share.
Both task scheduling and work-fetch policies involve first simulating the execution of existing tasks in FIFO order under the WRR policy. This simulation produces several outputs:
- For each processor type, the shortfall: that is, the number additional instance-seconds needed to keep all instances busy until max_buffer. In the example below, the shortfall (the gray areas) is 13.
- For each processor type, the number of currently idle instances.
- For each processor type, the saturated: the amount of time that all instances are busy.
- For each task T, a flag T.deadline_miss indicating whether the task missed its deadline in the simulation.
After doing the WRR simulation, a second simulation is done in which tasks that missed their deadline are executed in EDF order. For each resource type, this yields a busy time, which is the period during which all instances are busy in this simulation. This is passed to the scheduler; its significance is that no new jobs can possibly be started before this time.
Project scheduling priority
Both scheduling policies involve a notion of project scheduling priority, a dynamic quantity that reflects how much processing has been done recently by the project's tasks relative to its resource share.
The scheduling priority of a project P is computed as
SP(P) = - REC(P)/resource_share(P)
where REC(P) is the amount of processing done recently by P. REC and resource_share are normalized to sum to 1 over all projects.
Job scheduling
The job scheduling policy is roughly as follows:
- For each GPU type, schedule deadline-miss tasks in order of increasing deadline, until all instances are used.
- Same for CPU.
- For each GPU type, while there are still available instances, schedule non-deadline-miss tasks in order of project scheduling priority.
- Same for CPU.
As each job is scheduled, we increment a copy of REC(P) as if the job had run for one scheduling period. This encourages the scheduler to pick jobs from the same project.
Work fetch
The work fetch policy:
- Work fetch for a given processor type is initiated whenever the saturated period is less than min_buffer.
- Adjust SP(P) based on the amount of work currently queued
- Ask the fetchable project with greatest SP(P) for "shortfall" seconds of work.
- Whenever a scheduler RPC to project P is done (e.g. to report results) and SP(P) is greatest among fetchable projects for a given processor type, request "shortfall" seconds of that type.