Changes between Version 1 and Version 2 of PortalFeatures


Ignore:
Timestamp:
Sep 10, 2010, 3:31:59 PM (14 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PortalFeatures

    v1 v2  
    6262batches (if they are sufficiently infrequent) to get the full computing resource
    6363and finish as fast as possible.
     64
     65== Batch scheduling ==
     66
     67The second requirement of science portals is support for '''batches''':
     68groups of jobs that have value only when all of the jobs have been completed.
     69The interval between batch submission and completion is called '''makespan'''.
     70
     71=== Minimizing makespan ===
     72
     73BOINC's scheduler should be modified to reduce makespan.
     74The scheduler has control over two things:
     75
     76 * what hosts to send which job instances to (and when to create new instances)
     77 * what deadline to assign to each job
     78
     79A variety of techniques are possible:
     80
     81 * Use slower hosts for the first jobs of a batch,
     82   with the expectation that the host will finish one job
     83   in the duration of the batch
     84   (and by extension, very slow hosts should possibly
     85   not get any jobs for some batches).
     86 * Use faster, more reliable hosts for the last jobs of a batch.
     87 * Use increased replication for the last jobs of a batch
     88   to reduce the expected minimum turnaround.
     89
     90These techniques can be combined into policies.
     91To study the relative performance of these policies,
     92we can develop a simulator that takes as input a description
     93of a real volunteer host population (e.g., SETI@home)
     94and the parameters of a random batch arrival process,
     95and computes the statistics of makespan,
     96and perhaps the difference between predicted and actual makespan (see below).
     97
     98=== Estimating makespan ===
     99
     100Given the description of a possible new batch,
     101the system should be able to provide a reasonably accurate estimate
     102of its makespan (perhaps with error bars).
     103The computation of this estimate must take into account
     104
     105 * The computing resource (i.e. the distributions of speed and availability
     106   of the volunteer hosts, and their number).
     107 * The existing workload (batches currently in progress, and their state)
     108 * The parameters of the batch
     109 * The computing share and debt of the requesting user
     110 * The batch scheduling policy
     111
     112In addition, the system should provide users with a web-based interface
     113for viewing batches in progress,
     114showing their fraction done,
     115an updated estimate of their completion time,
     116information about errors,
     117and a button for aborting the batch.
     118
     119== Job DAGs ==
     120
     121Batches are a special case of Directed Acyclic Graphs (DAGs) of jobs.
     122The nodes in such a graph represent jobs;
     123there is an edge from job A to job B if A must be completed
     124before B can be started.
     125
     126A batch can be viewed as a graph consisting of
     127a "start node" fanning out to N job nodes,
     128then fanning back in to a "finish node".
     129
     130Science portals may offer the ability to submit collections of work
     131that can be described as DAGs but not as batches:
     132for example, a collection of units, each of which consists of
     133running program A and using its output as input to program B.
     134
     135(Note: BOINC's wrapper mechanism, which allows multiple applications
     136to be run in sequence as part of a BOINC job,
     137supports some simple cases, but it not general because it
     138requires dependent jobs to run on a single host).
     139
     140As an extension of the work described above,
     141we could support DAGs rather than just batches.