Changes between Version 3 and Version 4 of VolunteerDataArchival

Nov 23, 2011, 3:06:22 PM (13 years ago)



  • VolunteerDataArchival

    v3 v4  
    7070unreliable resources:
    72 '''Replication''': a file is divided into N pieces,
    73 and each piece is stored on M hosts.
     72=== Replication ===
     74With this technique, a file is divided into N chunks,
     75and each chunk is stored on M hosts.
    7476If a replica is lost, and there another replica,
    7577that replica is uploaded to the server, then downloaded to another host.
     78By increasing M, reliability can be made arbitrarily high.
    77 '''Coding''': with Reed-Solomon coding, a file is divided into N 'packets',
     80Replication has advantages:
     82 * Recovery from a failure is fast, since only one upload and download is done.
     83  This minimizes the chances of another failure occurring during recovery.
     84 * By making N large, the server storage needed for a recovery
     85  can be made arbitrarily small.
     87and disadvantages:
     89 * It has an extremely high space overhead,
     90  since M in general must be made large to provide reliability.
     91 * Even if individual chunks are made reliabile,
     92  the failure rate for the file as a whole increases exponentially with N
     94=== Coding ===
     96With Reed-Solomon coding, a file is divided into N 'packets',
    7897and an additional K checksum packets are generated.
    7998The original data can be reconstructed from any N of these N+K packets.
    81 In
     100Coding has advantages:
     102 * It can provide high reliability without high space overhead.
     103   For example, if N=40 and K=20, we can tolerate 20 simultaneous host failures
     104   with a space overhead of only 50%;
     105   with replication the overhead would be 2000%.
     107and disadvantages:
     109 * Regenerating a chunk requires reassembling the entire file on the server,
     110  defeating the purpose of distributed storage.
     112== Hybrid reliability mechanisms ==
     114Because of the above disadvantages,
     115neither replication nor coding alone is sufficient for volunteer data archival.
     116However, we can combine them in various ways that reduce the disadvantages.
     118=== Multi-level coding ===
     120=== Coding plus replication ===
     122== The VDAB simulator ==