Changes between Version 2 and Version 3 of VolunteerDataArchival
- Timestamp:
- Nov 23, 2011, 2:38:17 PM (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
VolunteerDataArchival
v2 v3 3 3 '''Volunteer data archival''' means using disk space on volunteered home computers 4 4 to store large data files. 5 This document describes the design of a system to5 This document describes the design of a VDAB, a system to 6 6 provide volunteer data archival on BOINC. 7 We assume the goalsinclude:7 The goals of VDAB include: 8 8 * Storing large (e.g. petabyte) files. 9 9 Files may be thousands of times larger than the … … 31 31 We don't consider direct client-to-client communication. 32 32 33 == Modeling recovery == 34 33 35 Recovering from the failure of a host, using techniques like replication, 34 36 involves uploading data from a 2nd host, then downloading it to a 3rd host. 35 37 Each of these steps may take days. 36 Th is, for volunteer storage the ratio38 Thus, for volunteer storage the ratio 37 39 38 40 average time to failure / average time to recover … … 41 43 In other distributed storage systems (such as RAIDs) this ratio may 42 44 be on the order of 100,000. 43 Thus, these systems can modeled as a sequence of individual 44 failures and recoveries. 45 Thus, these systems can modeled as a sequence of individual failures and recoveries. 45 46 46 Volunteer storage, on the other hand, must be modeled as process47 Volunteer data archival, on the other hand, must be modeled as process 47 48 in which multiple recoveries may be in progress at the same time, 48 49 and new failures may occur during these recoveries. 50 51 == The need for server storage == 52 53 Initially a file is stored in its entirety on the server. 54 It is downloaded to volunteer hosts. 55 Eventually it is retrieved, i.e. uploaded to the server again, 56 and perhaps deleted from volunteer hosts. 57 58 However, server storage must be used even while the file is 59 being stored on volunteer hosts. 60 This is because the mechanisms to handle host failures (see below) 61 involve uploading parts of the file to the server, 62 then downloading them to other hosts. 63 64 One of the goals of VDAB is to minimize the average amount of server storage 65 required to maintain reliability. 66 67 == Increasing reliability == 68 49 69 There are two basic techniques for achieving reliable storage using 50 70 unreliable resources: … … 55 75 that replica is uploaded to the server, then downloaded to another host. 56 76 57 58 59 77 '''Coding''': with Reed-Solomon coding, a file is divided into N 'packets', 60 78 and an additional K checksum packets are generated.