Changes between Initial Version and Version 1 of VolunteerDataArchival


Ignore:
Timestamp:
Nov 22, 2011, 3:04:00 PM (13 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VolunteerDataArchival

    v1 v1  
     1= Volunteer data archival =
     2
     3'''Volunteer data archival''' means using disk space on volunteered home computers
     4to store large data files.
     5This document describes the design of a system to
     6provide volunteer data archival on BOINC.
     7We assume the goals include:
     8 * Storing large (e.g. petabyte) files.
     9   Files may be thousands of times larger than the
     10   amount of space available on individual computers.
     11 * Store files are long periods.
     12 * Be able to reduce the probability of data loss
     13   to arbitrarily small levels.
     14
     15Properties of the volunteer host population include:
     16
     17 * A host may be sporadically available because
     18   it is turned off, or because the user has suspended network activity.
     19   Unavailable periods may range from minutes to several days.
     20 * The upload and download speeds of hosts vary widely,
     21   and can be fairly low (e.g. 1 Mbps) in some cases.
     22 * The amount of disk space available to a project on a given host
     23   may fluctuate over time, because of the user's own disk usage
     24   or disk usage by other BOINC projects to which the host is attached.
     25 * The population is dynamic: hosts are constantly arriving and leaving.
     26   The mean lifetime of a host may be fairly small
     27   (on the order of 100 days).
     28 * Many hosts are behind firewalls.
     29   We assume that all communication is initiated by the BOINC client,
     30   and involves HTTP requests to trusted project servers.
     31   We don't consider direct client-to-client communication.
     32
     33There are two basic techniques for achieving reliable storage using
     34unreliable resources:
     35
     36 * '''Replication''': a file
     37
     38 * '''Coding''': with Reed-Solomon coding, a file is divided into N 'packets',
     39  and an additional K checksum packets are generated.
     40  The original data can be reconstructed from any N of these N+K packets.