Changes between Initial Version and Version 1 of ResearchProjects


Ignore:
Timestamp:
Sep 4, 2009, 12:56:51 PM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ResearchProjects

    v1 v1  
     1= Research projects involving BOINC =
     2
     3Possible research projects involving BOINC and volunteer computing,
     4appropriate for senior-level class projects or Masters theses.
     5
     6== Virtualizing volunteer computing ==
     7
     8The volunteer computing host population is highly heterogeneous
     9in terms of software environment (operating system type and version,
     10system libraries, installed packages).
     11Projects are faced with the difficult task of building application
     12versions for all these different environments;
     13this is a significant barrier to the usage of volunteer computing.
     14
     15This problem can be mitigated using virtual machine technology.
     16In this approach, a hypervisor such as VMWare or VirtualBox is
     17installed (manually or automatically) on volunteer hosts.
     18An application consists of a virtual machine image
     19containing the application proper together with the required libraries and packages.
     20A "wrapper" program provides an interface between
     21the BOINC client and the hypervisor, so that, for example,
     22the application can be suspended and resumed in accordance
     23with user preferences.
     24
     25The project (a collaboration with CERN and INRIA)
     26is to implement this "volunteer cloud" model,
     27to optimize it (e.g., to minimize VM image sizes),
     28and to develop tools to facilitate its use by computational scientists.
     29
     30== Analyze and improve adaptive replication ==
     31
     32Because volunteer hosts may be error-prone or malicious,
     33volunteer computing requires result validation.
     34The general way to do this is by replication:
     35run each job on 2 computers and make sure the results agree.
     36
     37To reduce the 50% overhead of two-fold replication,
     38BOINC has a mechanism called "adaptive replication"
     39that runs jobs with no replication on hosts with low error rates,
     40while continuing to randomly intersperse replicated jobs.
     41
     42The project is to identify possible counter-strategies for adaptive replication,
     43to establish bounds on the overall effectiveness of adaptive replication,
     44and to identify refinements that increase the effectiveness.
     45
     46== Extend and refine the BOINC credit system ==
     47
     48The idea of "credit" - a numerical measure of work done -
     49is essential to volunteer computing,
     50as it provides volunteers an incentive and a basis for competition.
     51Currently, BOINC's credit mechanism is based on the
     52number of floating-point operations performed.
     53
     54The project is to design a new credit system where
     55a) credit can be given for resources other than computing
     56(e.g., storage, network bandwidth);
     57b) the credit given per FLOP can depend on factors such
     58as RAM size and job turnaround time.
     59Ideally the system allow a game-theoretic proof
     60that it leads to an optimal allocation of resources.
     61
     62== Latency-oriented volunteer computing ==
     63
     64The early volunteer computing projects (SETI@home, Climateprediction.net)
     65are "throughput oriented": they want to maximize the number of jobs
     66completed per day, not minimize the turnaround time of individual jobs.
     67BOINC's scheduling mechanisms reflect this; for example, they try to
     68assign multiple jobs at a time so that client/server interactions are minimized.
     69
     70More recent volunteer computing projects are "latency-oriented":
     71they want to minimize the makespan of batches of jobs.
     72The project is to redesign BOINC's scheduling mechanisms so that they
     73can support latency-oriented computation,
     74and to validate the new mechanisms via simulation
     75(using an existing simulator).
     76
     77== Volunteer data archival ==
     78
     79While BOINC is currently used for computation,
     80it also provides primitives for distributed data storage:
     81file transfers, queries, and deletion.
     82The project is to develop a system that uses these primitives
     83to implement a distributed data archival system
     84that uses replication to achieve target levels of
     85reliability and availability.
     86
     87== Invisible GPU computing ==
     88
     89BOINC has recently added support for GPU computing,
     90and several projects now offer applications for NVIDIA and ATI GPUs.
     91One problem with this is that GPU usage is not prioritized,
     92so when a science application is running
     93the performance of user-visible applications is noticeable degraded.
     94As a result, BOINC's default behavior is that science applications
     95are not run while the computer is in use (i.e., while there has been
     96recent mouse or keyboard activity).
     97
     98The project (in collaboration with NVIDIA and possibly AMD/ATI)
     99is to make changes to BOINC and to the GPU drivers
     100so that the GPU can be used as much as possible,
     101even while the computer is in use,
     102without impacting the performance of user-visible applications.
     103
     104== Estimating job completion times in a heterogeneous environment ==
     105
     106Accurate job completion time estimates are essential to BOINC.
     107Underestimates can waste computation,
     108and overestimates can cause resource idleness.
     109BOINC's current mechanisms for estimating job completion times
     110have various shortcomings:
     111a) they require projects to estimate job FLOPS requirements in advance,
     112and to estimate the FLOPS performance of applications on particular hosts;
     113most projects don't have the ability or willingness to provide
     114accurate estimates;
     115b) they are based on peak hardware performance
     116(e.g., benchmark values), and actual application performance
     117can be wildly different, especially for multi-threaded and GPU applications.
     118
     119The project is to design, implement, and study a system that automatically
     120estimates job completion times on heterogeneous hosts,
     121and that provides estimates of the actual number of FLOPS performed
     122by a given job.