| 77 | | '''Coding''': with Reed-Solomon coding, a file is divided into N 'packets', |
| | 80 | Replication has advantages: |
| | 81 | |
| | 82 | * Recovery from a failure is fast, since only one upload and download is done. |
| | 83 | This minimizes the chances of another failure occurring during recovery. |
| | 84 | * By making N large, the server storage needed for a recovery |
| | 85 | can be made arbitrarily small. |
| | 86 | |
| | 87 | and disadvantages: |
| | 88 | |
| | 89 | * It has an extremely high space overhead, |
| | 90 | since M in general must be made large to provide reliability. |
| | 91 | * Even if individual chunks are made reliabile, |
| | 92 | the failure rate for the file as a whole increases exponentially with N |
| | 93 | |
| | 94 | === Coding === |
| | 95 | |
| | 96 | With Reed-Solomon coding, a file is divided into N 'packets', |
| 81 | | In |
| | 100 | Coding has advantages: |
| | 101 | |
| | 102 | * It can provide high reliability without high space overhead. |
| | 103 | For example, if N=40 and K=20, we can tolerate 20 simultaneous host failures |
| | 104 | with a space overhead of only 50%; |
| | 105 | with replication the overhead would be 2000%. |
| | 106 | |
| | 107 | and disadvantages: |
| | 108 | |
| | 109 | * Regenerating a chunk requires reassembling the entire file on the server, |
| | 110 | defeating the purpose of distributed storage. |
| | 111 | |
| | 112 | == Hybrid reliability mechanisms == |
| | 113 | |
| | 114 | Because of the above disadvantages, |
| | 115 | neither replication nor coding alone is sufficient for volunteer data archival. |
| | 116 | However, we can combine them in various ways that reduce the disadvantages. |
| | 117 | |
| | 118 | === Multi-level coding === |
| | 119 | |
| | 120 | === Coding plus replication === |
| | 121 | |
| | 122 | == The VDAB simulator == |