| | 1 | = Redundancy and errors = |
| | 2 | |
| | 3 | A BOINC 'result' abstracts an instance of a computation, possibly not performed yet. Typically, a BOINC server sends 'results' to clients, and the clients perform the computation and replies to the server. But many things can happen to a result: |
| | 4 | |
| | 5 | * The client computes the result correctly and returns it. |
| | 6 | * The client computes the result incorrectly and returns it. |
| | 7 | * The client fails to download or upload files. |
| | 8 | * The application crashes on the client. |
| | 9 | * The client never returns anything because it breaks or stops running BOINC. |
| | 10 | * The scheduler isn't able to send the result because it requires more resources than any client has. |
| | 11 | |
| | 12 | BOINC provides a form of redundant computing in which each computation is performed on multiple clients, the results are compared, and are accepted only when a 'consensus' is reached. In some cases new results must be created and sent. |
| | 13 | |
| | 14 | BOINC manages most of the details; however, there are two places where the application developer gets involved: |
| | 15 | |
| | 16 | * '''Validation:''' This performs two functions. First, when a sufficient number (a 'quorum') of successful results have been returned, it compares them and sees if there is a 'consensus'. The method of comparing results (which may need to take into account platform-varying floating point arithmetic) and the policy for determining consensus (e.g., best two out of three) are supplied by the application. If a consensus is reached, a particular result is designated as the 'canonical' result. Second, if a result arrives after a consensus has already been reached, the new result is compared with the canonical result; this determines whether the user gets credit. |
| | 17 | * '''Assimilation:''' This is the mechanism by which the project is notified of the completion (success or unsuccessful) of a work unit. It is performed exactly once per work unit. If the work unit was completed successfully (i.e. if there is a canonical result) the project-supplied function reads the output file(s) and handles the information, e.g. by recording it in a database. If the workunit failed, the function might write an entry in a log, send an email, etc. |
| | 18 | |
| | 19 | ---- |
| | 20 | |
| | 21 | In the following example, the project creates a workunit with |
| | 22 | min_quorum = 2 |
| | 23 | target_nresults = 3 |
| | 24 | max_delay = 10 |
| | 25 | |
| | 26 | BOINC automatically creates three results, which are sent at various times. At time 8, two successful results have returned so the validator is invoked. It finds a consensus, so the work unit is assimilated. At time 10 result 3 arrives; validation is performed again, this time to check whether result 3 gets credit. |
| | 27 | |
| | 28 | {{{ |
| | 29 | time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
| | 30 | |
| | 31 | created validate; assimilate |
| | 32 | WU x x x |
| | 33 | created sent success |
| | 34 | result 1 x x---------------x |
| | 35 | created sent success |
| | 36 | result 2 x x-------------------x |
| | 37 | created sent success |
| | 38 | result 3 x x-----------------------x |
| | 39 | }}} |
| | 40 | |
| | 41 | ---- |
| | 42 | |
| | 43 | In the next example, result 2 is lost (i.e., there's no reply to the BOINC scheduler). When result 3 arrives a consensus is found and the work unit is assimilated. At time 13 the scheduler 'gives up' on result 2 (this allows it to delete the canonical result's output files, which are needed to validate late-arriving results). |
| | 44 | |
| | 45 | {{{ |
| | 46 | time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
| | 47 | |
| | 48 | created validate; assimilate |
| | 49 | WU x x x |
| | 50 | created sent success |
| | 51 | result 1 x x---------------x |
| | 52 | created sent lost giveup |
| | 53 | result 2 x x-------- x |
| | 54 | created sent success |
| | 55 | result 3 x x-----------------------x |
| | 56 | }}} |
| | 57 | |
| | 58 | ---- |
| | 59 | |
| | 60 | In the next example, results 2 returns an error at time 5. This reduces the number of outstanding results to 2; because target_nresults is 3, BOINC creates another result (result 4). A consensus is reached at time 9, before result 4 is returned. |
| | 61 | |
| | 62 | {{{ |
| | 63 | time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
| | 64 | |
| | 65 | created validate; assimilate |
| | 66 | WU x x x |
| | 67 | created sent success |
| | 68 | result 1 x x---------------x |
| | 69 | created sent error |
| | 70 | result 2 x x-------x |
| | 71 | created sent success |
| | 72 | result 3 x x-------------------x |
| | 73 | created sent success |
| | 74 | result 4 x x----------------------x |
| | 75 | }}} |