Context Navigation

← Previous Version
View Latest Version
Next Version →

Version 1 (modified by davea, 14 years ago) (diff)
--

Volunteer data archival

Volunteer data archival means using disk space on volunteered home computers to store large data files. This document describes the design of a system to provide volunteer data archival on BOINC. We assume the goals include:

Storing large (e.g. petabyte) files. Files may be thousands of times larger than the amount of space available on individual computers.
Store files are long periods.
Be able to reduce the probability of data loss to arbitrarily small levels.

Properties of the volunteer host population include:

A host may be sporadically available because it is turned off, or because the user has suspended network activity. Unavailable periods may range from minutes to several days.
The upload and download speeds of hosts vary widely, and can be fairly low (e.g. 1 Mbps) in some cases.
The amount of disk space available to a project on a given host may fluctuate over time, because of the user's own disk usage or disk usage by other BOINC projects to which the host is attached.
The population is dynamic: hosts are constantly arriving and leaving. The mean lifetime of a host may be fairly small (on the order of 100 days).
Many hosts are behind firewalls. We assume that all communication is initiated by the BOINC client, and involves HTTP requests to trusted project servers. We don't consider direct client-to-client communication.

There are two basic techniques for achieving reliable storage using unreliable resources:

Replication: a file

Coding: with Reed-Solomon coding, a file is divided into N 'packets', and an additional K checksum packets are generated. The original data can be reconstructed from any N of these N+K packets.

Download in other formats:

Plain Text