Version 1 (modified by 17 years ago) (diff) | ,
---|
Scheduler job matchmaking
Currently the scheduler's work-sending algorithm is:
- if host is reliable, scan entire job array, looking for retries
- if using HR, scan entire job array, looking for jobs committed to this HR class
- if still need work, scan entire job array with no restrictions
This is bad for several reasons:
- inefficient: repeated array scans
- inflexible policy: why should reliable be more important than HR?
- hard to add new policies, like sending "hard" jobs to fasts hosts
To solve these problems we're going to change this part of the scheduler. Basic idea: given a job J and a host H, there is a function V(J, H) that represents the value of sending J to H. V(J, H) might reflect various factors:
- the computational "hardness" of J
- CPU/deadline ratio
- RAM or disk requirements
- H already has files required by J(or jobs already planned for sending have such files)
- J is a retry and H is a fast/reliable host
- J has already been committed to H's HR class
BOINC includes a default value function. Projects can tweak its weights, or define their own value function.
Details
functions:
- fast_filter()
- feasibility checks that can be done with no DB access
- slow_filter()
- feasibility checks that need DB access
When we scan a slot, the possibilities are:
- slot is empty
- slot is locked by another scheduler
- job fails fast filter
- job fails slow filter
- none of the above
Parameters:
- N
- scan at least this many slots (if scan N slots and have enough jobs to send, stop)
- M
- scan at most this many slots (even if don't have enough jobs to send yet)
- L
- if scan this many locked slots, print warning msg (should increase shmem size)
logic:
acquire semaphore i = random index in shmem x = ordered list of jobs to send (empty) slots_scanned = 0 slots_locked = 0 loop i = i+1 % array_size slots_scanned++ if slots_scanned > M break if shmem[i] is empty continue if shmem[i] is locked slots_locked++ continue j = shmem[i] if !fast_filter(j) continue; v = v(h, j) if v > lowest score in x lock j release semaphore if !slow_filter(j) acquire semaphore unlock j continue acquire semaphore if x satisfies client's work request unlock and remove lowest-score element of x add j to x if x satisfies client's work request and slots_scanned >= N break; for each job j in x mark slot j as empty release semaphore if slots_locked > L print "need bigger array" message