Changes between Version 4 and Version 5 of VolunteerStorage


Ignore:
Timestamp:
Jul 30, 2011, 10:57:26 PM (13 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VolunteerStorage

    v4 v5  
    11[[PageOutline]]
    22
    3 = Distributed data management =
     3= Volunteer storage =
    44
    5 BOINC provides features for implementing distributed data management systems.
     5BOINC provides features that support '''volunteer storage''':
     6that is, distributed data management systems based on volunteer resources.
    67
    7 Such a system is implemented as several components:
     8== Storage applications ==
     9
     10One application of volunteer storage is '''pure storage''':
     11volunteer hosts are used to store data that originates from the server.
     12Data units have parameters such as target reliability (loss rate),
     13read latency, and read throughput.
     14The application decides which hosts to use and how to replicate data.
     15It may stripe or code the data,
     16since clients don’t need to access the data in its original form.
     17
     18Other storage applications combine storage and computation in various ways:
     19
     20 * '''Archival of computational results''':
     21  for example, Climateprediction.net proposed storing the large (2 GB)
     22  output files of climate model runs on the host for several months,
     23  so that they are available to scientists if something interesting
     24  is found in the small summary files that are sent to the server.
     25 * '''Dataset storage''': for example, gene and protein databases
     26  could be distributed across a client pool,
     27  and could be queried via BLAST or other standard applications.
     28  \MapReduce-type systems also fall in this category.
     29 * '''Data stream buffering''': for instruments that produce large amounts of data,
     30  volunteer storage can provide a large buffer,
     31  increasing the time window during which transient events can be re-analyzed.
     32 * '''Locality scheduling''':
     33  a job assignment policy that preferentially sends jobs
     34  whose input files are already resident on that client,
     35  thus reducing data server load.
     36  Data files are “sticky”, and remain resident on hosts until there
     37  are no more jobs that use them, at which point they are deleted.
     38
     39== BOINC's volunteer storage architecture ==
     40
     41The architecture involves two layers:
     42
     43[[image storage.png]]
     44
     45The '''BOINC distributed file management''' provides basic mechanisms:
     46 * Server-side estimation of host parameters such as
     47  future availability, latency, and upload/download speed.
     48 * File transfers between server and client.
     49 * Possibly peer-to-peer file transfers using Attic.
     50 * Maintenance of a database table tracking which files are present on which hosts;
     51 * A mechanism in the client for allocating disk space
     52  to projects and deciding when a project must delete files;
     53 * A mechanism for conveying this information to the server,
     54  and for the server to tell the client which files to delete.
     55
     56The upper layer consists of storage applications.
     57Each of the storage applications listed above
     58has goals that drive the data placement and replication policy.
     59For example, Dataset Storage applications would try to store an amount
     60of data per host in proportion to its available processing power,
     61to minimize the time needed to process the entire data set.
     62
     63A storage application is implemented as several components:
    864
    965 * A "plug-in" to the BOINC scheduler, which is called on each scheduler RPC;
     
    1167 * Interface programs, for example, programs allowing users to submit and retrieve files.
    1268
    13 Note: to use these features, you must include
     69== The BOINC distributed file management API ==
     70
     71The distributed file management API includes several functions,
     72each of which can be invoked as a command-line program
     73are as a C++ function call.
     74To use these features, you must include
    1475{{{
    1576<msg_to_host/>
     
    1778in your config.xml.
    1879
    19 == Sending files to hosts ==
     80=== Sending files to hosts ===
    2081
    2182From an interface program or daemon, call
     
    3798}}}
    3899
    39 == Retrieving files ==
     100=== Retrieving files ===
    40101
    41102From an interface program or daemon, call
     
    58119}}}
    59120
    60 == Deleting files ==
     121=== Deleting files ===
    61122
    62123From an interface program or daemon, call
     
    78139}}}
    79140
    80 == Implementation ==
     141== Implementation notes ==
    81142
    82143From interface programs,
     
    87148The result has a name of the form '''file_xfer_*''',
    88149which tells the scheduler to treat it specially.
     150