wiki:GpuWorkFetch

Version 13 (modified by davea, 16 years ago) (diff)

--

Work fetch and GPUs

Problems with the current work fetch policy

The current work-fetch policy is essentially:

  • Do a weighted round-robin simulation, computing the CPU shortfall (i.e., the idle CPU time we expect during the work-buffering period).
  • If there's a CPU shortfall, request work from the project with highest long-term debt (LTD).

The scheduler request has a scalar "work_req_seconds" indicating the total duration of jobs being requested.

This policy has some problems. First:

  • There's no way for the client to say "I have N idle CPUs; send me enough jobs to use them all".

And various problems related to GPUs:

  • If there is no CPU shortfall, no work will be fetched even if GPUs are idle.
  • If a GPU is idle, we should get work from a project that potentially has jobs for it.
  • If a project has both CPU and GPU jobs, the client should be able to tell it to send only GPU (or only CPU) jobs.
  • LTD is computed solely on the basis of CPU time used, so it doesn't provide a meaningful comparison between projects that use only GPUs, or between a GPU and CPU projects.

This document proposes a modification to the work-fetch system that solves these problems.

For simplicity, the design assumes that there is only one GPU type (CUDA). It is straightforward to extend the design to handle additional GPU types.

Terminology

A job sent to a client is associated with an app version, which uses some number (possibly fractional) of CPUs and CUDA devices.

  • A CPU job is one that uses only CPU.
  • A CUDA job is one that uses CUDA (and may use CPU as well).

Scheduler request

New fields in the scheduler request message:

double cpu_req_seconds: number of CPU seconds requested

double cuda_req_seconds: number of CUDA seconds requested

double ninstances_cpu: send enough jobs to occupy this many CPUs

double ninstances_cuda: send enough jobs to occupy this many CUDA devs

For compatibility with old servers, the message still has work_req_seconds; this is the max of (cpu,cuda)_req_seconds.

Client

New abstraction: processing resource or PRSC. There are two processing resource types: CPU and CUDA.

Per-resource-type backoff

We need to handle the situation where there's a GPU shortfall but no projects are supplying GPU work (for either permanent or transient reasons). We don't want an overall work-fetch backoff from those projects. Instead, we maintain a separate backoff timer per (project, PRSC). This is doubled whenever we ask for only work of that type and don't get any; it's cleared whenever we get a job of that type.

Work-fetch state

Each PRSC has its own set of data related to work fetch. This is stored in an object of class PRSC_WORK_FETCH.

Data members of PRSC_WORK_FETCH

ninstances

Used/set by rr_simulation()):

double shortfall: shortfall for this resource

double nidle: number of currently idle instances

Member functions of PRSC_WORK_FETCH:

rr_init(): called at the start of RR simulation. Compute share of each project for this PRSC, and clear shortfall.

set_nidle(): called by RR sim after initial job assignment. Set nidle to # of idle instances.

accumulate_shortfall(dt): called by RR sim for each time interval during work buf period.

nidle_now = ninstances - instances in use
shortfall += dt*(nidle_now)
for each project p not backed off for this PRSC
    add_proj_shortfall(p, dt)

prepare(): called before exists_fetchable_project(). sees if there's project to req from for this resource, and caches it

bool exists_fetchable_project(): there's a project we can ask for work for this resource

select_project(priority, char buf): if the importance of getting work for this resource is P, chooses and returns a PROJECT to request work from, and a string to put in the request message Choose the project for which LTD + expected payoff is largest

Values for priority:

  • DONT_NEED: no shortfalls
  • NEED: a shortfall, but no idle devices right now
  • NEED_NOW: idle devices right now

runnable_resource_share(): total resource share of projects with runnable jobs for this resource.

get_priority()

bool count_towards_share(PROJECT p): whether to count p's resource share in the total for this rsc

whether we've got a job of this type in last 30 days

add_shortfall(PROJECT, dt): add x to this project's shortfall, where x = dt*(share - instances used)

double total_share(): total resource share of projects we're counting

accumulate_debt(dt): for each project p:

x = insts of this device used by P's running jobs
y = P's share of this device
update P's LTD

Each PRSC also needs to have some per-project data. This is stored in an object of class PRSC_PROJECT_DATA. It has the following "persistent" members (i.e., saved in state file):

double long_term_debt*

backoff timer*: how long to wait until ask project for work specifically for this PRSC; double this any time we ask for work for this rsc and get none (maximum 24 hours). Clear it when we ask for work for this PRSC and get some job.

And the following transient members (used by rr_simulation()):

double share: # of instances this project should get based on resource share relative to the set of projects not backed off for this PRSC.

instances_used: # of instances currently being used

double shortfall

debt accounting

for each resource type
   R.accumulate_debt(dt)

RR simulation

cpu_work_fetch.rr_init()
cuda_work_fetch.rr_init()

compute initial assignment of jobs
cpu_work_fetch.set_nidle();
cuda_work_fetch.set_nidle();

do simulation as current
on completion of an interval dt
   cpu_work_fetch.accumulate_shortfall(dt)
   cuda_work_fetch.accumulate_shortfall(dt)

Work fetch

send_req(p)
	switch cpu_work_fetch.priority
		case DONT_NEED
			set no_cpu in req message
		case NEED, NEED_NOW:
			work_req_sec = p.cpu_shortfall
			ncpus_idle = p.max_idle_cpus
	switch cuda_work_fetch.priority
		case DONT_NEED
			set no_cuda in the req message
		case NEED, NEED_NOW:

for prior = NEED_NOW, NEED
	for each coproc C (in decreasing order of importance)
	p = C.work_fetch.select_proj(prior, msg);
		if p
			put msg in req message
			send_req(p)
			return
		else 
	p = cpu_work_fetch(prior)
		if p
			send_req(p)
			return

Handling scheduler reply

if request.

Scheduler changes

global vars
	have_cpu_app_versions
	have_cuda_app_versions
per-req vars
	bool coproc_request
	ncpu_jobs_sending
	ncuda_jobs_sending
	ncpu_seconds_to_fill
	ncuda_seconds_to_fill
	seconds_to_fill
		(backwards compat; used if !coproc_request)
overall startup
	scan app versions, set have_x vars
req startup
	if send_only_cpu and no CPU app versions, don't send work
	if send_only_cuda and no CUDA app versions, don't send work
work_needed()
	need_more_cpu_jobs =
		n_cpu_jobs_sending < ninstances_cpu
		or cpu_seconds_to_fill > 0
	same for cuda
	return false if don't need more CPU or more CUDA
get_app_version
	if send_only_cpu, ignore CUDA versions
	if send_only_cuda, ignore CPU versions
when commit a job
	update n*_jobs_sending,
		n*_seconds_to_fill,
		seconds_to_fill