Context Navigation

Changes between Version 4 and Version 5 of VolunteerStorage

Timestamp:: Jul 30, 2011, 10:57:26 PM (14 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

VolunteerStorage

-                      v4
+                      v5
 [[PageOutline]]
 = Distributed data management =
+= Volunteer storage =
+BOINC provides features for implementing distributed data management systems.
+BOINC provides features that support '''volunteer storage''':
+that is, distributed data management systems based on volunteer resources.
+Such a system is implemented as several components:
+== Storage applications ==
+One application of volunteer storage is '''pure storage''':
+volunteer hosts are used to store data that originates from the server.
+Data units have parameters such as target reliability (loss rate),
+read latency, and read throughput.
+The application decides which hosts to use and how to replicate data.
+It may stripe or code the data,
+since clients don’t need to access the data in its original form.
+Other storage applications combine storage and computation in various ways:
+ * '''Archival of computational results''':
+  for example, Climateprediction.net proposed storing the large (2 GB)
+  output files of climate model runs on the host for several months,
+  so that they are available to scientists if something interesting
+  is found in the small summary files that are sent to the server.
+ * '''Dataset storage''': for example, gene and protein databases
+  could be distributed across a client pool,
+  and could be queried via BLAST or other standard applications.
+  \MapReduce-type systems also fall in this category.
+ * '''Data stream buffering''': for instruments that produce large amounts of data,
+  volunteer storage can provide a large buffer,
+  increasing the time window during which transient events can be re-analyzed.
+ * '''Locality scheduling''':
+  a job assignment policy that preferentially sends jobs
+  whose input files are already resident on that client,
+  thus reducing data server load.
+  Data files are “sticky”, and remain resident on hosts until there
+  are no more jobs that use them, at which point they are deleted.
+== BOINC's volunteer storage architecture ==
+The architecture involves two layers:
+[[image storage.png]]
+The '''BOINC distributed file management''' provides basic mechanisms:
+ * Server-side estimation of host parameters such as
+  future availability, latency, and upload/download speed.
+ * File transfers between server and client.
+ * Possibly peer-to-peer file transfers using Attic.
+ * Maintenance of a database table tracking which files are present on which hosts;
+ * A mechanism in the client for allocating disk space
+  to projects and deciding when a project must delete files;
+ * A mechanism for conveying this information to the server,
+  and for the server to tell the client which files to delete.
+The upper layer consists of storage applications.
+Each of the storage applications listed above
+has goals that drive the data placement and replication policy.
+For example, Dataset Storage applications would try to store an amount
+of data per host in proportion to its available processing power,
+to minimize the time needed to process the entire data set.
+A storage application is implemented as several components:
  * A "plug-in" to the BOINC scheduler, which is called on each scheduler RPC;
 …
  * Interface programs, for example, programs allowing users to submit and retrieve files.
+Note: to use these features, you must include
+== The BOINC distributed file management API ==
+The distributed file management API includes several functions,
+each of which can be invoked as a command-line program
+are as a C++ function call.
+To use these features, you must include
 {{{
 <msg_to_host/>
 …
 in your config.xml.
 == Sending files to hosts ==
+=== Sending files to hosts ===
 From an interface program or daemon, call
 …
 }}}
 == Retrieving files ==
+=== Retrieving files ===
 From an interface program or daemon, call
 …
 }}}
 == Deleting files ==
+=== Deleting files ===
 From an interface program or daemon, call
 …
 }}}
 == Implementation ==
+== Implementation notes ==
 From interface programs,
 …
 The result has a name of the form '''file_xfer_*''',
 which tells the scheduler to treat it specially.