= Adaptive replication =
BOINC's default replication policy replicates a job even if
one of the hosts is known to be highly reliable.
The overhead of replication is high - at least 50% of total CPU time
is spent checking validity.
'''Adaptive replication''' is an optional policy that avoids replicating a job
if it has been sent to a highly reliable host.
The goal of this policy is to provide a target level of confidence
with minimal overhead - perhaps only 5% or 10% of total CPU time.
== Policy ==
BOINC maintains an estimate E(H) of host H's recent error rate.
This is maintained as follows:
* It is initialized to 0.1
* It is multiplied by 0.95 when H reports a correct (replicated) result.
* It is incremented by 0.1 when H reports an incorrect (replicated) result.
Thus, it takes a long time to earn a good reputation
and a short time to lose it.
The adaptive replication policy is as follows.
* Each job is initially marked as unreplicated.
* On each request, the scheduler decides whether to trust the host as follows:
* If E(H) > A, don't trust the host.
* Otherwise, trust the host with probability 1 - sqrt( E(H)/A ).
* If we decide to trust the host, preferentially send it unreplicated jobs.
* Otherwise, preferentially send it replicated jobs. If we have to send it an unreplicated job, mark it as replicated and create new instances accordingly.
== Using adaptive replication ==
To use adaptive replication for a given app:
* Set app.target_nresults to 2 in the database.
* Create jobs with target_nresults=1 and min_quorum=1; i.e. include
{{{
1
1
}}}
in the [WorkGeneration#templates input template file].
== Implementation ==
Database:
* Add "target_nresults" field to app table. Default is zero (app doesn't use adaptive replication).
Scheduler:
* Decide whether to trust host as described above.
* If we send an unreplicated job (i.e., target_nresults=1 and app.target_nresults>1) to an untrusted host, set wu.target_nresults = app.target_nresults and flag the WU for transitioning.
Validator:
* Don't update host.error_rate for unreplicated results (i.e., wu.target_nresults=1 and app.target_nresults>1).