Changes between Version 8 and Version 9 of AdaptiveReplication
- Timestamp:
- Mar 9, 2015, 1:45:31 PM (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
AdaptiveReplication
v8 v9 4 4 one of the hosts is known to be highly reliable. 5 5 The overhead of replication is high - at least 50% of total CPU time 6 is spent checking validity.6 is spent checking result validity. 7 7 8 '''Adaptive replication''' is an optional policy that avoids replicating a job9 if it has been sent to a highly reliable host.8 '''Adaptive replication''' is an optional policy that avoids replicating jobs 9 that are sent to highly reliable hosts. 10 10 The goal of this policy is to provide a target level of confidence 11 11 with minimal overhead - perhaps only 5% or 10% of total CPU time. … … 13 13 == Policy == 14 14 15 BOINC maintains an estimate E(H) of host H's recent error rate. 16 This is maintained as follows: 17 18 * It is initialized to 0.1 19 * It is multiplied by 0.95 when H reports a correct (replicated) result. 20 * It is incremented by 0.1 when H reports an incorrect (replicated) result. 21 22 Thus, it takes a long time to earn a good reputation 23 and a short time to lose it. 15 BOINC maintains the number CV(H, V) of consecutive valid results 16 return by host H using app version V. 17 This is incremented when a replicated job computed with (H, V) is validated, 18 and is zeroed when such a job is found to be invalid. 19 (V is included because, for example, some hosts may be less reliable 20 for GPU jobs than for CPU jobs). 24 21 25 22 The adaptive replication policy is as follows. 26 23 27 24 * Each job is initially marked as unreplicated. 28 * On each request, the scheduler decides whether to trust the host as follows:29 * If E(H) > A, don't trust the host.30 * Otherwise, trust the host with probability 1 - sqrt( E(H)/A).25 * When sending a job using app version V, the scheduler decides whether to trust the host as follows: 26 * If CV(H, V) < 10, don't trust the host. 27 * Otherwise, trust the host randomly with probability 1 - 1/CV(H, V). 31 28 * If we decide to trust the host, preferentially send it unreplicated jobs. 32 * Otherwise, preferentially send it replicated jobs. If we have to send it an unreplicated job, mark it as replicated and create new instances accordingly. 33 34 In the current code base (as of r18056), A is hardcoded to be 0.05 in sched_send.cpp as `ER_MAX`. 29 * Otherwise, preferentially send it replicated jobs. 30 If we have to send it an unreplicated job, mark it as replicated and create new instances accordingly. 35 31 36 32 == Using adaptive replication == … … 53 49 Scheduler: 54 50 * Decide whether to trust host as described above. 55 * If we send an unreplicated job (i.e., target_nresults=1 and app.target_nresults>1) to an untrusted host, set wu.target_nresults = app.target_nresults and flag the WU for transitioning. 56 57 Validator: 58 * Don't update host.error_rate for unreplicated results (i.e., wu.target_nresults=1 and app.target_nresults>1). 51 * If we send an unreplicated job (i.e., target_nresults=1 and app.target_nresults>1) to an untrusted host, 52 set wu.target_nresults = app.target_nresults and flag the WU for transitioning.