Changes between Initial Version and Version 1 of WorkShop13/HackfestNotes


Ignore:
Timestamp:
Jan 20, 2014, 1:53:16 PM (11 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WorkShop13/HackfestNotes

    v1 v1  
     1Welcome to the BOINC'13 hackfest notepad
     2
     3
     4* Misc
     5** Tools for hackfest communication all year long
     6   Two days of hackfest was probably more productive. How could we keep doing this for the whole year ? Matt suggested to use Jira and Tristan made a demo/presentation of  it:
     7         https://www.atlassian.com/software/jira
     8    Jira is maintained by the bitbucket company. Maybe the same kind of things could be done with github. Then the community behind the tool is also important to take into account.
     9** Suggestions and requests
     10- Christian: deadline extension for workunits. This requires to update the client.
     11** Next BOINC workhop
     12    Hawai`i ? More seriously Budapest may be an option.
     13
     14* BOINC on Android , making Android Apps
     15
     16Joachim, Matt, Keith, Uwe
     17** Expected
     18   Making Android app is not a group thing so maybe not for today, although Uwe is interested in learning this... There is a wiki page about this that could be tested.
     19   
     20   
     21   What we should discuss instead:
     22   - Discussing the UI. What could be improved. An outside eye would help.
     23   - Helping with the registration, google account setup or OpenID.
     24**  What was done
     25    - Uwe tried the tool chain, used the instructions on the BOINC wiki page. It mostly worked and he improved them.
     26    - Regarding google/openID registration: they completed a proof of concept that included testing website integration, android application integration and reviewed the requirements in order to make it work within using wxWidgets on the BOINC client. I could not follow all the technical works that remains for it to work but David and Kevin did so they should remember it....
     27
     28* Multi-user projects
     29Arnaud, Lionel, Wenjing, Kevin.
     30** Points to take into account:
     31*** Batch prioritization
     32    - multi-user
     33    - multi-app/project
     34    - better user contribution in term of volunteer machines may mean better machine share (coallitions ?)
     35*** Batch completion
     36    - task granularity matching (GPU/CPU/Android)
     37    - machine availability and speed to accelerate batch completion
     38    - homogeneous redundancy
     39    - how to estimate the expected runtime of B given the project's entire resource
     40   
     41    A basic mechanism could be that batches are prioritized according to some fairness mechanism and whenever a volunteer requests work, we try to find a job suitable for him but with stricter "SLA". We would give him a job only if we have good confidence it will be able to process it in time, i.e. if there is a chance that the job completes before the expected deadline of the batch.
     42   
     43    Issue: a wrong estimation of batch completion will lead to starving jobs...
     44       Possible solution: keep a separate queue of high priority jobs with higher replication / which are sent to reliable fast machines
     45
     46    There is a mechanism in the server that "forces" the client to contact the server every so often, to make sure the client reports a job as soon as it's completed instead of waiting for the deadline.
     47
     48
     49**** BOINC Brainstorming
     50     This is a follow up of the [[file:journal.org::*Discussions%20with%20David][discussions with David]].
     51***** Wiki design documents.
     52http://boinc.berkeley.edu/trac/wiki/PortalFeatures
     53http://boinc.berkeley.edu/trac/wiki/MultiUser
     54http://boinc.berkeley.edu/trac/wiki/MultiUserPriority
     55http://boinc.berkeley.edu/trac/wiki/JobPrioritization
     56***** Policy in six steps
     57****** First step: compute batch priorities
     58******* Goal
     59       The goal of this first step serves two goals:
     60       1. obtain priorities to order batches
     61       2. give a fair share of resources (without necessarily taking into
     62             account the preferences of the volunteers)
     63******* Solution
     64       As a point of simplification, we don't take into account volunteers
     65       preferences in the sharing of the platform so we use the whole
     66       aggregated power of the platform to compute the estimated runtime
     67       of the batches.
     68
     69       Based on user shares and runtime estimates of batches, we compute
     70       Logical Start Time of users and batches and Logical End Time of
     71       batches. Then we prioritize batches by increasing Logical Start
     72       Time.
     73
     74       This is described here:
     75       http://boinc.berkeley.edu/trac/wiki/PortalFeatures#Prioritizingbatches
     76
     77       This is done at batch submission. Unfortunately, there is a
     78       problem with this approach, which we evoke now
     79******* What if LST of some users become incredibly large ?
     80           That happens if one or two users are inactive for a long time. Is
     81           it a problem ?
     82
     83           Discussion about how much "burst" we could allow. It would be very
     84           nice to be able to say that a user does not have more "advantage"
     85           if he stays inactive for more than some time (say 1 week). One way
     86           of doing this is to compute the virtual schedule by assigning
     87           shares based on active users instead of all users (like done in
     88           http://rr.liglab.fr/research_report/RR-LIG-033_orig.pdf), and
     89           constraint Logical Start Times to be at least "now - 1 week".
     90
     91           We need to take into account the problem of the computing cost
     92           of recomputing everything each time a user queue changes state
     93           from being empty or not (optimization concern here).
     94******* What if LET is badly estimated
     95****** Second step: compute batch deadlines
     96******* Goal
     97           The objective of this step is to provide an estimate which allows
     98           to perform "resource selection": those "batch deadlines" will be
     99           used to prevent slow machines to execute jobs from this batch.
     100           Too tight deadlines may incur starvation, so we will add a deadline
     101           extension mechanism. But even with this mechanism, too tight
     102           deadlines will slow the progress of the batch by excluding not so
     103           slow machines.
     104
     105    On the other hand, too loose deadlines will slow the progress
     106           of the batch by accepting slower machines that could/should
     107           have contributed to a lower priority batch. This may seem
     108           strange but there could be a very good reason for doing
     109           this. Systematically excluding slow workers will require to
     110           know about jobs a long time in advance, which may be
     111           problematic. For example, in WCG, they only load batches one
     112           day in advance and cannot load more than this. So this kind
     113           of projects should have much looser deadlines to make sure
     114           every volunteer gets a batch it can work on.
     115******* How to do it
     116       The server computes an "estimated" schedule (still based on a
     117       fluid view of the platform).
     118
     119       *Recheck whether this makes sense or not*
     120       There are at least 2 ways of doing this:
     121       - Arnaud: Very optimistic one. If we exclude slow hosts, and
     122         perform a little bit of anticipated replication near the end
     123         of the batch, and if batches are large enough, estimated
     124         execution times are actually quite close to reality. I think
     125         this is somehow illustrated here:
     126         http://www.cs.technion.ac.il/~dang/conf_papers/SilbersteinSGS09.pdf‎
     127         But it is actually also necessary to take the number of jobs
     128         in the batch into account (if it has too few jobs, then
     129         estimating its finishing time based on the power of the whole
     130         platform is overly optimistic -- even if actually, the slow
     131         machines have started working on this batch, before).
     132
     133           Crude proposition :
     134           For batch $i$ (sorted by priority)
     135                T_i = (\sum_{j=0}^i C_j)/(Total platform power)
     136                R_j = estimated Computational Cost of Batch j
     137   
     138                T'_i = Expected turnaround time of the (\sum_j=0^i N_j)-fastest
     139                machines
     140                  N_j = number of jobs in batch j
     141   
     142               Estimated finish time of Batch i = max(T_i, T'_i)
     143   
     144         Note that this is may be quite inaccurate when the system is
     145         not in steady state because the project ran out of work. The
     146         estimation for the first jobs is going to be completely
     147         of. We will have the same kind of trouble if the batches are
     148         relatively small, hence the next proposals.
     149       - Lionel: *supposedly more accurate*, you add the power of the
     150         fastest machines until you get to complete the whole batch.
     151         in order of priority, and based on distribution of turnaround
     152            times of machines, compute a deadline that would allow
     153            sufficiently many machines to finish all jobs. I'm not really
     154            sure how to do this without keeping track of the queue of each
     155            machine, which is something we do NOT want to do. Maybe by
     156            splitting the machines in a small number of groups with similar
     157            performance ?
     158       - Kevin: *maybe smarter but complex* whenever the system gets
     159         empty, Arnaud's estimation is way too optimistic. So we could
     160         try to estimate the aggregate power of the system as time
     161         goes before entering steady state and integrate this curve to
     162         evaluate the estimated completion time of the batch.
     163       - Kevin 2: divide the batch cost by the aggregated power of the
     164         say 50% fastest machine. Somehow, it's the same as option 1
     165         but giving some slack, maybe just another way of thinking of
     166         it.
     167       - *Kevin 3* (maybe the preferred one ?): if batches are much
     168         smaller than available host, use the median speed of
     169         machines, and compute home much time it would take to
     170         complete 1 job on this machine. Within such time bound, we
     171         would expect "90-95%" of the jobs to be completed. Some
     172         others will have to be resubmited but only on the fastest
     173         machines. So recompute the same value but with the
     174         90%-fastest machine.
     175
     176       In any case, it is important to set a "minimum value" for
     177       deadlines to avoid starving of batches.
     178
     179       But apparently it is not necessary to give a very high slack at
     180       this step. Furthermore, these batch deadlines are likely to
     181       have to be extended as time goes if we realize they are too
     182       tight and exclude too many people. This could be done with
     183       thresholds but we don't have a nice idea of a general solution.
     184
     185       Yet, after discussing with Kevin, we realized some "bad"
     186       situations may happen, and that it may call for some kind of
     187       advanced control loop between the batch deadline estimation and
     188       the regulator. If we have several ongoing high priority batches
     189       for which we made a too optimistic estimation of their
     190       deadline. Then all of them should see their deadline set to a
     191       minimum value that selects only the 10% fastest machines. As
     192       these batch do not complete and require replicas only on the
     193       10% fastest machines, the regulator will see the UNSENT job set
     194       filled with urgent jobs, which means the deadlines are too
     195       tight and need to be re-extended. So the regulator may actually
     196       be the right person to take care of batch deadline management.
     197****** Third step: populate the shared memory segment (feeder+regulator)
     198******* Goal
     199        When a host requests work, it will pick from the feeding array
     200        (shared memory segment) the jobs with higher priority that it
     201        can finish by the deadline.
     202       
     203    It is thus important that *the feeding array contains enough
     204    diversity* for the job/volunteer matchmaking to work: filling it in
     205    order of priority may result in "slow" machines not getting any
     206    work, because all jobs in the feeding array would have too tight
     207    deadlines.
     208******* How to do it
     209       To do this, one way is to analyze the "expected performance" of
     210       machines (ie their speed modified by their average
     211       availability), divide it in quantiles, and make sure that there
     212       is a similar amount of work for each of these quantiles. Or
     213       (and it may be easier) respect minimal amounts for each
     214       quantile, and fill the rest with high priority jobs (e.g,
     215       replicated jobs that need to be resent because of an error on a
     216       client). This is a generalization of the "size matching"
     217       mechanism already in place in the regulator, and could be
     218       implemented in (an alternate version of) the regulator as well.
     219
     220       We also need to ensure that there is some jobs available for each
     221       user/project, because some volunteers accept only some limited
     222       subset of users. This would be the job of the regulator, based on
     223       the current set of "UNSENT" jobs.
     224
     225       In this case, probably the best implementation for the feeder would
     226       be to pick at random from the "UNSENT" jobs. This way, unless,
     227       there is an incredibly high diversity, and a large difference
     228       between the size of the UNSENT job set and the size of the shared
     229       memory array.
     230****** Fourth step: job/volunteer matchmaking
     231******* Goal
     232           Here, we want to enforce the priority values computed in the first
     233           step.
     234******* How to do it
     235           We scan the array and select jobs such that
     236        1. expected completion on this particular volunteer is smaller
     237           than batch deadline
     238        2. It fits the size constraints (i.e., not too large and not
     239           too small)
     240        3. It respects the volunteer project preferences.
     241    4. Make sure that the deadline the task is going to get will
     242           not create a too important slack (we may want to use
     243           volunteer provided information on how often they reconnect
     244           as an estimation of what cannot be accepted) as it may make
     245           some volunteers unhappy. Ideally, there would be a
     246           volunteer provided value regarding minimum slack.
     247
     248       Among these jobs, we select the ones with the highest batch
     249       priority until we get the desired amount of work and jobs are
     250       still feasible.
     251       
     252       Issues:
     253       - Strictly smaller than batch deadline may create starvation of the
     254            batch. So we need a mechanism to re-extend batch deadlines
     255         but this was discussed earlier.
     256****** Fifth step: assign job deadlines
     257       Left open, we do not know yet how to do this.
     258       
     259       A lot of discussions but here is what we finally proposed:
     260
     261       Let BD be the batch deadline. Let T_90 be the time a job of the
     262       batch would take on the 90% percentile machine. Say we aim to
     263       at most 3 series of resubmission.  Then we initially set the
     264       job deadlines to BD-3T_90. Whenever this deadline passed, we
     265       enter the resubmission mechanism and we only replicate with a
     266       tight deadline of T_90. This resubmission means you create an
     267       additional replica of the job because the deadline was passed.
     268****** Sixth step: trigger replication
     269       Instead of having only resubmission, we may want to have
     270       speculative job replication. This means that when deadline
     271       BD-3T_90 is passed, we resubmit not just one missed job, but
     272       several copies to decrease the failure probability and hence
     273       the potential re-resubmission.
     274****** Possible Client Mechanism Evolution
     275       - add a job state that says "report asap"
     276       - add a mechanism that says "run immediately" ?
     277       - allow volunteers to minimum slack
     278***** Other Concerns
     279****** Too tight job/batch deadlines ?
     280       This requires tuning I guess
     281****** How to do aggressive replication for the straggler jobs ?
     282       It depends on what we mean by aggressive. We can either use
     283       automatic resubmission or speculative replication or a
     284       combination.
     285****** Does there exist "processor affinity" ?
     286       I mean jobs which are really more efficient on GPU, and jobs more
     287       efficient on CPU ? Answer is yes but we may want to consider
     288       this kind of optimization.
     289****** Turnaround time estimation vs. Throughput estimation
     290       We're currently effective turnaround time and not potential
     291       turnaround time. In WCG, some very fast workers have a huge
     292       cash and take a lot of time to complete jobs, hence a poor
     293       effective turnaround time whereas they have a huge throughput.
     294       Actually, we do not keep track of this throughput but we keep
     295       track of the credit history, which may be a more accurate
     296       measure as job duration from a batch to another may vary
     297       whereas credit is bound the amount of work to be done.
     298****** Why do we have a notion of batch LET and batch deadline?
     299       The LET notion is used to ensure there is a "fair" sharing of
     300       resource. The batch deadline is here to exclude slow
     301       volunteers. So although the two notions have quite different
     302       usages, they are very related. Kevin provided a good reason for
     303       having two separate notions computed in a different way. If
     304       there is ever a batch B1 with a large number of small jobs and
     305       a batch B2 with a small number of large jobs, they will both
     306       incur the same resource usage. However the time it will
     307       actually take to complete will be quite different.
     308       
     309       Note that this means that if we ever compute batch deadline
     310       using Kevin 3 option, B1 will get rather tight deadline. It is
     311       still not clear to me whether it is a good or a bad
     312       thing. Somehow volunteers expect reasonable slacks. One option
     313       would be to enforce a minimum slack for volunteers so we added
     314       this constraint in the matchmaking.
     315****** Can we "predict" availability based on current uptime ?
     316       This is an interesting question but this seems very
     317       difficult. Maybe I could discuss with Jean-Marc and he would
     318       convince me it's just unfeasible. :)
     319****** Issues raised by Uwe
     320         On a project where the number of batches/jobs that are ready to run is very large relative to the size of the infrastructure, preloading all workunits into BOINC can cause a significant degredation of performance.  There may need to be a way to load a batch so that it can be priortized and planned, but that the workunit/result records are not created and files are not copied to download until the system knows that those are required.  A mechanism would need to exist to assess this condition and trigger that creation.
     321
     322
     323* Application specific search  space division
     324   Wenjie, Gerdus
     325   
     326   More or less the same scheme can be applied, maybe with some problem-specific tweaking, but not much. No BOINC server side change is needed, a few simple scripts can already implement the mechanism quite well. Maybe some kind of priority system can be implemented in the recycling.
     327   Splitting leftovers is also a way to deal with the long workunits that lasts, but maybe application-specific.
     328    Important to keep things simple, even at the cost of accuracy! This is an engineering problem.
     329* TODO Integration with hubs, clouds, grids, and desktop grids
     330  Joszef and Adam talked about it but we don't know much.  We should ask them.
     331* Make project web sites translatable
     332   We should be careful about words with context-sensitive meaning. Having the context would be useful when translating.
     333   Maybe, SETI@home could be a good test-case.
     334   
     335* Creating and deploying VM-based app versions
     336   Carlos, Christian, Uwe, Carlos Val. Wenjing, Francisco, dario
     337   The smallest VM size Christian achieved was ~680 MB (uncompressed) and 180 MB (gzip compressed). Getting lower would require specific kernels, and then there is a increased risk of bugs
     338   It would be great if we could get it under the 100MB range. In Sztaki, they simply compress the image and it shrinks from 700 to 100MB, which is sufficient.
     339   
     340   Christian explained Dario and Carlos how to create VM-based apps and how he deals with this in RNA World VmImage: http://www.rnaworld.de/rnaworld/download/rnaWorld2GB.vdi.gz
     341   Extract the VDI to your hard disk, create a new Virtual Machine using this VDI and create a shared folder called 'shared' and place an executable/script named 'boinc_app' inside. This will be executed by the startup script of the VM. You can do for example:
     342       #!/bin/bash
     343       sleep 10
     344       echo "Hello World" > ../shared/hello.txt
     345       sleep 10
     346   You can also interrupt the startup script by Ctrl-C in the VM and explore the inside.
     347
     348    We identified the problem of getting serious errors from inside the VM  in a generic way. One way would be to redirect the console output via a  serial port to the vboxwrapper and look for kernel panic or similiar  messages. One approach is the socat pipe (http://wiki.illumos.org/display/illumos/Serial+Console+in+VirtualBox).
     349   
     350  To solves these problems, Christian needs new features in the virtualbox wrapper, which could be based on socat.
     351
     352  Nothing much was done on the small VM size side.
     353 
     354  Right now, although we have an installer for both BOINC and Vbox, the BOINC page mainly advertize for BOINC only as we don't want people to install useless things. David would like to change this to a big download BOINC+Vbox button and a small download BOINC explaining that it may restrict project participation. Christian raises the fact that to run 64bits apps in Vbox, you need the VTX feature to be activated in the bios, and there is no way to check whether it is activated or not without trying to execute workunits. Ideally in such a case, the client should post a notification to the volunteer  to explain him what he should do to improve things.
     355 
     356  An alternative suggestion was to enable BOINC to initiate the download and installation of VirtualBox at the time the user attempts to connect to a project that requires VirtualBox
     357
     358* Php pages with twitter bootstrap
     359  Francisco, Dario,  David, Christian
     360 
     361  Only works with bootstrap 2.3.2 and can be integrated into default BOINC with some effort. I also think it's possible to make it optional to have an easy upgrade path for projects. It will be soon put into BOINC.
     362
     363* How to automate the end-to-end testing of BOINC?
     364  Dario, Kevin,  Joachim, Augustin, Adam
     365  - automate the database deployment, compiling, submission of a client
     366 
     367  Nothing was done here.
     368* Remote job submission : unification
     369  Wenjin, Christian, David
     370
     371  API-doc: http://boinc.berkeley.edu/trac/wiki/RemoteJobs
     372  It would be nice to have a disk quota for each submitter to limit disk usage on the server
     373  A hook on the server side to allow application specific preparation of batches. Possible implementation: place a file appname.inc in html/project.inc and submit_rpc_handler.php will look for this and call a specific function (prepare_job or prepare_batch) that is returning an XML structure that get's passed back to the submitter. Attention, this preparation could take some time!
     374  Another hook should make it possible to prepare the output data of a batch. A project might want to zip output files for each submitter on a daily basis before the whole batch is finished so the submitter can download these partial files.
     375
     376  Brainstormed on how name files should be managed. There is a beginning of a plan but it's not quite there yet. Maybe it will become more concrete next week.
     377 
     378
     379* Drupal/BOINC tutorial
     380  Tristan talked with whoever was interested (ClimatePrediction, LHC, CAS@home). He's going to write some documentations explaining how to do this, which should make it easy for everyone.
     381
     382* Abandonned tasks
     383** Sub-second CPU throttling.
     384** Further simplifying the BOINC install process and GUI
     385** Prototype a BOINC GUI using HTML5
     386** Improving the BOINC server documentation
     387** GPU, multi-thread, VM-based, and Android applications
     388** Data-intensive applications
     389