Changes between Version 7 and Version 8 of JobSizeMatching
- Timestamp:
- Apr 19, 2013, 1:15:48 PM (12 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
JobSizeMatching
v7 v8 7 7 Having a single job size can therefore present problems: 8 8 9 * If the size is toosmall, hosts with GPUs get huge numbers of jobs.9 * If the size is small, hosts with GPUs get huge numbers of jobs. 10 10 This causes performance problems on the client 11 11 and a high DB load on the server. 12 * If the size is toolarge, slow hosts can't get jobs,12 * If the size is large, slow hosts can't get jobs, 13 13 or they get jobs that take weeks to finish. 14 14 … … 18 18 19 19 We'll assume that jobs for a given application can be generated 20 in several discrete '''size classes''' 21 (the number of size classes is a parameter of the application).20 in several discrete '''size classes'''; 21 the number of size classes is a parameter of the application. 22 22 23 23 BOINC will try to send jobs of size class i 24 24 to devices whose effective speed is in the ith quantile, 25 where 'effective speed' is the product of the 26 device speed and the host's on-fraction. 25 where 'effective speed' is the product of the device speed and the host's on-fraction. 27 26 28 27 This involves 3 new integer DB fields: … … 32 31 33 32 The size class of a job is specified in the call to create_work(). 33 34 Apps with n_size_classes > 1 are called '''multi-size apps'''. 35 A project can have both multi-size and non-multi-size apps. 34 36 35 37 Notes: … … 47 49 The order statistics of device effective speed will be computed 48 50 by a new program '''size_census'''. 49 For each app with n_size_classes>1this does:51 For each multi-size app this does: 50 52 51 53 * enumerate host_app_versions for that app … … 59 61 == Scheduler changes == 60 62 61 When the scheduler sends jobs of a given app to a given processor,63 When the scheduler sends jobs of a given multi-size app to a given processor, 62 64 it should preferentially send jobs whose size class matches 63 65 the quantile of the processor. … … 79 81 * For each job, compute a "score" that includes various factors. 80 82 (reliable, beta, previously infeasibly, locality scheduling lite). 81 * Include a factor for job size;83 * For multi-size apps, include a factor for job size; 82 84 decrement the score of jobs that are too small, 83 85 and decrement more for jobs that are too large. … … 96 98 and the resource load maintaining a job array of that size. 97 99 * All other factors being equal, the scheduler will send jobs of other apps 98 rather than send a wrong-size job.99 This could potentially lead to starvation issues; we'll have to see .100 rather than send a job of non-optimal size class. 101 This could potentially lead to starvation issues; we'll have to see if this is a problem. 100 102 101 103 == Regulating the flow of jobs into shared memory == … … 114 116 115 117 Instead, we'll do the following: 116 * when jobs are created (in the transitioner) set their state to 117 INACTIVE rather than UNSENT. 118 This is done if app.n_size_classes > 1 118 * when jobs are created for a multi-size app (in the transitioner), 119 set their state to INACTIVE rather than UNSENT. 119 120 * have a new daemon ('''size_regulator''') that polls for the number of unsent 120 121 jobs of each type, and changes a few jobs from INACTIVE to UNSENT