7 | | Such a system is implemented as several components: |
| 8 | == Storage applications == |
| 9 | |
| 10 | One application of volunteer storage is '''pure storage''': |
| 11 | volunteer hosts are used to store data that originates from the server. |
| 12 | Data units have parameters such as target reliability (loss rate), |
| 13 | read latency, and read throughput. |
| 14 | The application decides which hosts to use and how to replicate data. |
| 15 | It may stripe or code the data, |
| 16 | since clients don’t need to access the data in its original form. |
| 17 | |
| 18 | Other storage applications combine storage and computation in various ways: |
| 19 | |
| 20 | * '''Archival of computational results''': |
| 21 | for example, Climateprediction.net proposed storing the large (2 GB) |
| 22 | output files of climate model runs on the host for several months, |
| 23 | so that they are available to scientists if something interesting |
| 24 | is found in the small summary files that are sent to the server. |
| 25 | * '''Dataset storage''': for example, gene and protein databases |
| 26 | could be distributed across a client pool, |
| 27 | and could be queried via BLAST or other standard applications. |
| 28 | \MapReduce-type systems also fall in this category. |
| 29 | * '''Data stream buffering''': for instruments that produce large amounts of data, |
| 30 | volunteer storage can provide a large buffer, |
| 31 | increasing the time window during which transient events can be re-analyzed. |
| 32 | * '''Locality scheduling''': |
| 33 | a job assignment policy that preferentially sends jobs |
| 34 | whose input files are already resident on that client, |
| 35 | thus reducing data server load. |
| 36 | Data files are “sticky”, and remain resident on hosts until there |
| 37 | are no more jobs that use them, at which point they are deleted. |
| 38 | |
| 39 | == BOINC's volunteer storage architecture == |
| 40 | |
| 41 | The architecture involves two layers: |
| 42 | |
| 43 | [[image storage.png]] |
| 44 | |
| 45 | The '''BOINC distributed file management''' provides basic mechanisms: |
| 46 | * Server-side estimation of host parameters such as |
| 47 | future availability, latency, and upload/download speed. |
| 48 | * File transfers between server and client. |
| 49 | * Possibly peer-to-peer file transfers using Attic. |
| 50 | * Maintenance of a database table tracking which files are present on which hosts; |
| 51 | * A mechanism in the client for allocating disk space |
| 52 | to projects and deciding when a project must delete files; |
| 53 | * A mechanism for conveying this information to the server, |
| 54 | and for the server to tell the client which files to delete. |
| 55 | |
| 56 | The upper layer consists of storage applications. |
| 57 | Each of the storage applications listed above |
| 58 | has goals that drive the data placement and replication policy. |
| 59 | For example, Dataset Storage applications would try to store an amount |
| 60 | of data per host in proportion to its available processing power, |
| 61 | to minimize the time needed to process the entire data set. |
| 62 | |
| 63 | A storage application is implemented as several components: |