Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of SchedMatch

Timestamp:: Jan 25, 2008, 1:50:44 PM (17 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

TabularUnified SchedMatch

                       v1
+= Scheduler job matchmaking =
+Currently the scheduler's work-sending algorithm is:
+ * if host is reliable, scan entire job array, looking for retries
+ * if using HR, scan entire job array, looking for jobs committed to this HR class
+ * if still need work, scan entire job array with no restrictions
+This is bad for several reasons:
+ * inefficient: repeated array scans
+ * inflexible policy: why should reliable be more important than HR?
+ * hard to add new policies, like sending "hard" jobs to fasts hosts
+To solve these problems we're going to change this part
+of the scheduler.
+Basic idea: given a job J and a host H,
+there is a function V(J, H) that represents the value of sending J to H.
+V(J, H) might reflect various factors:
+ * the computational "hardness" of J
+  * CPU/deadline ratio
+  * RAM or disk requirements
+ * H already has files required by J(or jobs already planned for sending have such files)
+ * J is a retry and H is a fast/reliable host
+ * J has already been committed to H's HR class
+BOINC includes a default value function.
+Projects can tweak its weights, or define their own value function.
+== Details ==
+functions:
+        fast_filter():: feasibility checks that can be done with no DB access
+        slow_filter():: feasibility checks that need DB access
+When we scan a slot, the possibilities are:
+ * slot is empty
+ * slot is locked by another scheduler
+ * job fails fast filter
+ * job fails slow filter
+ * none of the above
+Parameters:
+        N:: scan at least this many slots
+                (if scan N slots and have enough jobs to send, stop)
+        M:: scan at most this many slots
+                (even if don't have enough jobs to send yet)
+        L:: if scan this many locked slots, print warning msg
+                (should increase shmem size)
+logic:
+{{{
+acquire semaphore
+i = random index in shmem
+x = ordered list of jobs to send (empty)
+slots_scanned = 0
+slots_locked = 0
+loop
+        i = i+1 % array_size
+        slots_scanned++
+        if slots_scanned > M
+                break
+        if shmem[i] is empty
+                continue
+        if shmem[i] is locked
+                slots_locked++
+                continue
+        j = shmem[i]
+        if !fast_filter(j) continue;
+        v = v(h, j)
+        if v > lowest score in x
+                lock j
+                release semaphore
+                if !slow_filter(j)
+                        acquire semaphore
+                        unlock j
+                        continue
+                acquire semaphore
+                if x satisfies client's work request
+                        unlock and remove lowest-score element of x
+                add j to x
+        if x satisfies client's work request and slots_scanned >= N
+                break;
+for each job j in x
+        mark slot j as empty
+release semaphore
+if slots_locked > L
+        print "need bigger array" message
+}}}