= Adaptive replication =
BOINC's default replication policy replicates a job even if
one of the hosts is known to be highly reliable.
The overhead of replication is high - at least 50% of total CPU time
is spent checking result validity.
'''Adaptive replication''' is an optional policy that avoids replicating jobs
that are sent to highly reliable hosts.
The goal of this policy is to provide a target level of confidence
with minimal overhead - perhaps only 5% or 10% of total CPU time.
== Policy ==
BOINC maintains the number CV(H, V) of consecutive valid results
return by host H using app version V.
This is incremented when a replicated job computed with (H, V) is validated,
and is zeroed when such a job is found to be invalid.
(V is included because, for example, some hosts may be less reliable
for GPU jobs than for CPU jobs).
The adaptive replication policy is as follows.
* Each job is initially marked as unreplicated.
* When sending a job using app version V, the scheduler decides whether to trust the host as follows:
* If CV(H, V) < 10, don't trust the host.
* Otherwise, trust the host randomly with probability 1 - 1/CV(H, V).
* If we decide to trust the host, preferentially send it unreplicated jobs.
* Otherwise, preferentially send it replicated jobs.
If we have to send it an unreplicated job, mark it as replicated and create new instances accordingly.
== Using adaptive replication ==
To use adaptive replication for a given app:
* Set app.target_nresults to 2 in the database.
* Create jobs with target_nresults=1 and min_quorum=1; i.e. include
{{{
1
1
}}}
in the [WorkGeneration#templates input template file].
== Implementation ==
Database:
* Add "target_nresults" field to app table. Default is zero (app doesn't use adaptive replication).
Scheduler:
* Decide whether to trust host as described above.
* If we send an unreplicated job (i.e., target_nresults=1 and app.target_nresults>1) to an untrusted host,
set wu.target_nresults = app.target_nresults and flag the WU for transitioning.