Changes between Version 2 and Version 3 of VolunteerDataArchival


Ignore:
Timestamp:
Nov 23, 2011, 2:38:17 PM (12 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VolunteerDataArchival

    v2 v3  
    33'''Volunteer data archival''' means using disk space on volunteered home computers
    44to store large data files.
    5 This document describes the design of a system to
     5This document describes the design of a VDAB, a system to
    66provide volunteer data archival on BOINC.
    7 We assume the goals include:
     7The goals of VDAB include:
    88 * Storing large (e.g. petabyte) files.
    99   Files may be thousands of times larger than the
     
    3131   We don't consider direct client-to-client communication.
    3232
     33== Modeling recovery ==
     34
    3335Recovering from the failure of a host, using techniques like replication,
    3436involves uploading data from a 2nd host, then downloading it to a 3rd host.
    3537Each of these steps may take days.
    36 This, for volunteer storage the ratio
     38Thus, for volunteer storage the ratio
    3739
    3840 average time to failure / average time to recover
     
    4143In other distributed storage systems (such as RAIDs) this ratio may
    4244be on the order of 100,000.
    43 Thus, these systems can modeled as a sequence of individual
    44 failures and recoveries.
     45Thus, these systems can modeled as a sequence of individual failures and recoveries.
    4546
    46 Volunteer storage, on the other hand, must be modeled as process
     47Volunteer data archival, on the other hand, must be modeled as process
    4748in which multiple recoveries may be in progress at the same time,
    4849and new failures may occur during these recoveries.
     50
     51== The need for server storage ==
     52
     53Initially a file is stored in its entirety on the server.
     54It is downloaded to volunteer hosts.
     55Eventually it is retrieved, i.e. uploaded to the server again,
     56and perhaps deleted from volunteer hosts.
     57
     58However, server storage must be used even while the file is
     59being stored on volunteer hosts.
     60This is because the mechanisms to handle host failures (see below)
     61involve uploading parts of the file to the server,
     62then downloading them to other hosts.
     63
     64One of the goals of VDAB is to minimize the average amount of server storage
     65required to maintain reliability.
     66
     67== Increasing reliability ==
     68
    4969There are two basic techniques for achieving reliable storage using
    5070unreliable resources:
     
    5575that replica is uploaded to the server, then downloaded to another host.
    5676
    57 
    58 
    5977'''Coding''': with Reed-Solomon coding, a file is divided into N 'packets',
    6078and an additional K checksum packets are generated.