wiki:ResearchProjects

Version 10 (modified by davea, 13 years ago) (diff)

--

Research projects involving BOINC

Possible research projects involving BOINC and volunteer computing, appropriate for senior-level class projects or Masters theses. If you're interested, please contact David Anderson.

An out-of-date list of conferences related to volunteer computing is here.

Tools

The following tools may be useful:

  • The BOINC client simulator models a single client attached to one or more projects. It's useful for studying client scheduling policies.
  • The EmBOINC BOINC server emulator models a BOINC server and a population of clients. It's useful for studying server scheduling policies.
  • The scheduler driver can be used to measure the maximum throughput (jobs per second) of a BOINC back end.

Data-intensive volunteer computing

Currently, most BOINC projects work as follows:

  • Data are stored on the server
  • Pieces of data (input files) are sent to client, and jobs are run against them. When done, the files are deleted from the client.
  • Output files are sent back to the server.

This architecture doesn't scale well for data-intensive computing. There are various alternatives:

  • Workflows: DAGs of tasks connected by intermediate temporary files. Schedule them so that temp files remain local to client most of the time.
  • Stream computing: e.g., IBM Infosphere
  • Models that involve computing against a large static dataset: e.g. MapReduce, or Amazon's scheme in which they host common scientific datasets, and you can use EC2 to compute against them.

BOINC has some features that may be useful in these scenarios: e.g., locality scheduling and sticky files. It lacks some features that may be needed: e.g., awareness of client proximity, or the ability to transfer files directly between clients.

Virtualizing volunteer computing

The volunteer computing host population is highly heterogeneous in terms of software environment (operating system type and version, system libraries, installed packages). Projects are faced with the difficult task of building application versions for all these different environments; this is a significant barrier to the usage of volunteer computing.

This problem can be mitigated using virtual machine technology. In this approach, a hypervisor such as VMWare or VirtualBox is installed (manually or automatically) on volunteer hosts. An application consists of a virtual machine image containing the application proper together with the required libraries and packages. A "wrapper" program provides an interface between the BOINC client and the hypervisor, so that, for example, the application can be suspended and resumed in accordance with user preferences.

Some of this has already been done; see VmApps.

A related goal is to minimize VM image sizes; people at CERN are working on this.

A higher level goal is to implement a "volunteer cloud" model, and to develop tools to facilitate its use by computational scientists. A collaboration between CERN and INRIA exists in this area.

Analyze and improve adaptive replication

Because volunteer hosts may be error-prone or malicious, volunteer computing requires result validation. One way to do this is by replication: run each job on 2 computers and make sure the results agree.

To reduce the 50% overhead of two-fold replication, BOINC has a mechanism called "adaptive replication" that runs jobs with no replication on hosts with low error rates, while continuing to randomly intersperse replicated jobs.

The project is to identify possible counter-strategies for adaptive replication, to establish bounds on the overall effectiveness of adaptive replication, and to identify refinements that increase the effectiveness.

A related project is to prove the effectiveness (or ineffectiveness) of BOINC's mechanism to defeat 'cherry picking': completing only short jobs in an effort to get credit unfairly.

Extend and refine the BOINC credit system

The idea of "credit" - a numerical measure of work done - is essential to volunteer computing, as it provides volunteers an incentive and a basis for competition. Currently, BOINC's credit mechanism is based on the number of floating-point operations performed.

The project is to design a new credit system where a) credit can be given for resources other than computing (e.g., storage, network bandwidth); b) the credit given per FLOP can depend on factors such as RAM size and job turnaround time. Ideally the system allows a game-theoretic proof that it leads to an optimal allocation of resources.

Predictable and accelerated batch/DAG completion

Support the notion of batch (or more generally DAG) of jobs. Allow users (i.e. scientists using a project) to get a-priori estimates of batch completion. Develop scheduling policies that optimize batch completion (i.e. makespan).

Latency-oriented volunteer computing

The early volunteer computing projects (SETI@home, Climateprediction.net) are "throughput oriented": they want to maximize the number of jobs completed per day, not minimize the turnaround time of individual jobs. BOINC's scheduling mechanisms reflect this; for example, they try to assign multiple jobs at a time so that client/server interactions are minimized.

More recent volunteer computing projects are "latency-oriented": they want to minimize the makespan of batches of jobs. The project is to redesign BOINC's scheduling mechanisms so that they can support latency-oriented computation, and to validate the new mechanisms via simulation.

Volunteer data archival

While BOINC is currently used for computation, it also provides primitives for distributed data storage: file transfers, queries, and deletion. The project is to develop a system that uses these primitives to implement a distributed data archival system that uses replication to achieve target levels of reliability and availability.

Invisible GPU computing

BOINC has recently added support for GPU computing, and several projects now offer applications for NVIDIA and ATI GPUs. One problem with this is that GPU usage is not prioritized, so when a science application is running the performance of user-visible applications is noticeable degraded. As a result, BOINC's default behavior is that science applications are not run while the computer is in use (i.e., while there has been recent mouse or keyboard activity).

The project (in collaboration with NVIDIA and possibly AMD/ATI) is to make changes to BOINC and to the GPU drivers so that the GPU can be used as much as possible, even while the computer is in use, without impacting the performance of user-visible applications.

Microjobs

Design a library that can be linked to an existing application, providing a lightweight BOINC client. When the application is run, the library contacts projects and gets "microjobs" (small or variable-size jobs). It computes while the app is open. When the app exits, the library reports whatever computation has been done.