| 29 | | * We should eliminate as much config as possible. |
| 30 | | There should be no thresholds for turnaround time. |
| 31 | | (especially a project-wide one; this should be per app). |
| 32 | | * The notion of "reliable host" need not be binary. |
| 33 | | Maybe we should do it in terms of order statistics - |
| 34 | | 50th percentile hosts, 90th percentile, etc. |
| 35 | | Note: this is on a per (host, app version) basis. |
| 36 | | * We need to think about how this interacts with HR. |
| | 28 | * Support the QoS features. |
| | 29 | * Avoid assigning "tight" deadlines unnecessarily, because |
| | 30 | * Doing so may make it impossible to assign tight deadlines |
| | 31 | to jobs that actually need it. |
| | 32 | * Tight deadlines often force the client to preempt other jobs, |
| | 33 | which irritates some volunteers. |
| | 34 | * Avoid long delays between the completion of a job instance |
| | 35 | and its validation. |
| | 36 | These delays irritate volunteers and increase server disk usage. |
| | 37 | * Minimize server configuration. |
| 43 | | * Send special jobs only to special hosts. |
| 44 | | The danger: a special job may sit in the cache |
| 45 | | for a long time, maybe forever. |
| 46 | | * If we get a request from a non-special host, |
| 47 | | and we can't satisfy it with non-special jobs, |
| 48 | | send it special jobs too. |
| 49 | | The danger: special jobs may be sent to a slow or unreliable host. |
| | 41 | We need a way to identify hosts that can turn around jobs quickly |
| | 42 | and reliably. |
| | 43 | Notes: |
| | 44 | * This is a property of (host, app version), not host. |
| | 45 | * This is not the same as processor speed. |
| | 46 | A host may have high turnaround time for various reasons: |
| | 47 | * Large min work buffer size. |
| | 48 | * Attached to lots of other projects. |
| | 49 | * Long periods of unavailability or network disconneciton. |
| 51 | | Compromises are possible; |
| 52 | | e.g. we could associate a "min percentile" with each job in cache, |
| 53 | | and send a job only to (host, app version) of that percentile or greater. |
| 54 | | The min percentile could be decayed over time |
| 55 | | so that job would always eventually get sent. |
| | 51 | We propose the following. |
| | 52 | For each app A: |
| | 53 | |
| | 54 | * For each (host, app version) let X be the percentile |
| | 55 | of turnaround time |
| | 56 | * For each (host, app version) let Y be the percentile |
| | 57 | of "consecutive valid results" (or +infinity if > 10) |
| | 58 | over all active hosts and all current app versions. |
| | 59 | * Let P(H, AV) = min(X, Y) |
| | 60 | |
| | 61 | This will be computed periodically (say, 24 hours) |
| | 62 | by a utility program. |
| | 63 | |
| | 64 | Notes: |
| | 65 | * When a new app version is deployed, |
| | 66 | the host_app_version records for the previous version should be copied, |
| | 67 | on the assumption that hosts reliable for on version |
| | 68 | will be reliable for the next. |
| | 69 | |
| | 70 | == Batch completion estimation == |
| | 71 | |
| | 72 | The proposed policies require estimates C(B) of batch completion, |
| | 73 | I'm not sure exactly how to compute these, but |
| | 74 | * it should be based on completed and validated jobs rather than |
| | 75 | a priori FLOPs estimates. |
| | 76 | * it should reflect (host, app version) information |
| | 77 | (e.g. turnaround and elapsed time statistics) |
| | 78 | for the hosts that have completed jobs, |
| | 79 | and for the host population as a whole |
| | 80 | * They should be computed by a daemon process, |
| | 81 | triggered by the passage time and |
| | 82 | by the validation of jobs in the batch. |
| | 83 | |
| | 84 | Notes: |
| | 85 | * C(B) is different from the "logical end time" of the batch |
| | 86 | used in batch scheduling. |
| | 87 | * For long-deadline batches, C(B) should probably be at least |
| | 88 | the original delay bound plus the greatest dispatch time of first |
| | 89 | instance of a job. |
| | 90 | I.e. if it takes a long time to dispatch the first instances, |
| | 91 | adjust the deadline accordingly to avoid creating a deadline crunch. |
| | 92 | |
| | 93 | == Proposed feeder policy == |
| | 94 | |
| | 95 | Proposed enumeration order: |
| | 96 | {{{ |
| | 97 | (LET(J) asc, nretries desc) |
| | 98 | }}} |
| | 99 | where LET(J) is the logical end time of the job's batch. |
| | 100 | |
| | 101 | == Proposed scheduler policy == |
| | 102 | |
| | 103 | For each processor type T (CPU and GPU) we have a "busy time" BT: |
| | 104 | the time already committed to high-priority jobs. |
| | 105 | For a given job J we can compute the estimated runtime R. |
| | 106 | The earliest we can expect to finish J is then BT + R, |
| | 107 | so that's the earliest deadline we can assign. |
| | 108 | Call this MD(J, T). |
| | 109 | |
| | 110 | For each app A and processor type T, compute the best app version |
| | 111 | BAV(A, T) at the start of handling each request. |
| | 112 | |
| | 113 | The rough policy then is: |
| | 114 | {{{ |
| | 115 | for each job J in the array, belonging to batch B |
| | 116 | for each usable app version AV of type T |
| | 117 | if B is AFAP an there's no estimate yet |
| | 118 | if P(H, AV) > 50% |
| | 119 | send J using AV, with deadline BT + R |
| | 120 | else |
| | 121 | x = MD(J, T) |
| | 122 | if x < C(B) |
| | 123 | send J using AV, with deadline C(B) |
| | 124 | else if P(H, AV) > 90% |
| | 125 | send J using AV, with deadline x |
| | 126 | }}} |
| | 127 | |
| | 128 | Make an initial pass through the array |
| | 129 | sending only jobs that have a percentile requirement. |
| | 130 | |
| | 131 | Notes |
| | 132 | |
| | 133 | * The 50% and 90% can be parameterized |
| | 134 | * Retries are not handled differently at this level, |
| | 135 | although we could add a restriction like sending |
| | 136 | them only to top 50% hosts |
| | 137 | * In the startup case (e.g. new app) no hosts will be high percentile. |
| | 138 | How to avoid starvation? |
| | 139 | * I think that score-based scheduling is now deprecated. |
| | 140 | The feasibility and/or desirability of a job may depend |
| | 141 | on what other jobs we're sending, |
| | 142 | so it doesn't make sense to assign it a score in isolation. |
| | 143 | It's simpler to scan jobs and make a final decision for each one. |
| | 144 | There are a few properties we need to give priority to: |
| | 145 | * Limited locality scheduling |
| | 146 | * beta jobs |
| | 147 | We can handle these in separate passes, as we're doing now. |