77 | | '''Coding''': with Reed-Solomon coding, a file is divided into N 'packets', |
| 80 | Replication has advantages: |
| 81 | |
| 82 | * Recovery from a failure is fast, since only one upload and download is done. |
| 83 | This minimizes the chances of another failure occurring during recovery. |
| 84 | * By making N large, the server storage needed for a recovery |
| 85 | can be made arbitrarily small. |
| 86 | |
| 87 | and disadvantages: |
| 88 | |
| 89 | * It has an extremely high space overhead, |
| 90 | since M in general must be made large to provide reliability. |
| 91 | * Even if individual chunks are made reliabile, |
| 92 | the failure rate for the file as a whole increases exponentially with N |
| 93 | |
| 94 | === Coding === |
| 95 | |
| 96 | With Reed-Solomon coding, a file is divided into N 'packets', |
81 | | In |
| 100 | Coding has advantages: |
| 101 | |
| 102 | * It can provide high reliability without high space overhead. |
| 103 | For example, if N=40 and K=20, we can tolerate 20 simultaneous host failures |
| 104 | with a space overhead of only 50%; |
| 105 | with replication the overhead would be 2000%. |
| 106 | |
| 107 | and disadvantages: |
| 108 | |
| 109 | * Regenerating a chunk requires reassembling the entire file on the server, |
| 110 | defeating the purpose of distributed storage. |
| 111 | |
| 112 | == Hybrid reliability mechanisms == |
| 113 | |
| 114 | Because of the above disadvantages, |
| 115 | neither replication nor coding alone is sufficient for volunteer data archival. |
| 116 | However, we can combine them in various ways that reduce the disadvantages. |
| 117 | |
| 118 | === Multi-level coding === |
| 119 | |
| 120 | === Coding plus replication === |
| 121 | |
| 122 | == The VDAB simulator == |