| | 1 | = Adaptive replication = |
| | 2 | |
| | 3 | BOINC's current replication policy replicates a job even if |
| | 4 | one of the hosts is known to be highly reliable. |
| | 5 | The overhead of replication is high - at least 50% of total CPU time |
| | 6 | is spent checking validity. |
| | 7 | |
| | 8 | '''Adaptive replication''' is an optional policy that avoids replicating a job |
| | 9 | if it has been sent to a highly reliable host. |
| | 10 | The goal of this policy is to provide a target level of confidence |
| | 11 | with minimal overhead - perhaps only 5% or 10% of total CPU time. |
| | 12 | |
| | 13 | == Policy == |
| | 14 | |
| | 15 | BOINC maintains an estimate E(H) of host H's recent error rate. |
| | 16 | This is maintained as follows: |
| | 17 | |
| | 18 | * It is initialized to 0.1 |
| | 19 | * It is multiplied by 0.95 when H reports a correct (replicated) result. |
| | 20 | * It is incremented by 0.05 when H reports an incorrect (replicated) result. |
| | 21 | |
| | 22 | Thus, it takes a long time to earn a good reputation |
| | 23 | and a short time to lose it. |
| | 24 | |
| | 25 | The adaptive replication policy is as follows. |
| | 26 | |
| | 27 | * Each job is initially marked as unreplicated. |
| | 28 | * On each request, the scheduler decides whether to trust the host as follows: |
| | 29 | * If E(H) > A, don't trust the host. |
| | 30 | * Otherwise, trust the host with probability 1 - E(H)/A. |
| | 31 | * If we decide to trust the host, preferentially send it unreplicated jobs. |
| | 32 | * Otherwise, preferentially send it replicated jobs. If we have to send it an unreplicated job, mark it as replicated and create new instances accordingly. |
| | 33 | |
| | 34 | == Implementation == |
| | 35 | |
| | 36 | Database: |
| | 37 | * Add "target_nresults" field to app table. Default is zero (app doesn't use adaptive replication). |
| | 38 | |
| | 39 | Scheduler: |
| | 40 | * Decide whether to trust host as described above. |
| | 41 | * If we send an unreplicated job (i.e., target_nresults=1 and app.target_nresults>1) to an untrusted host, set wu.target_nresults = app.target_nresults and flag the WU for transitioning. |
| | 42 | |
| | 43 | Validator: |
| | 44 | * Don't update host.error_rate for unreplicated results (i.e., wu.target_nresults=1 and app.target_nresults>1). |