Changes between Version 2 and Version 3 of LocalityNew
- Timestamp:
- Aug 14, 2012, 2:18:04 PM (12 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
LocalityNew
v2 v3 9 9 * A given file may be used by many jobs. 10 10 * The density of jobs in the file sequence may be variable. 11 * Several batches may be in progress concurrently. 11 * Several batches may be in progress concurrently, 12 for the same or different applications. 12 13 13 14 == Goals == 14 15 15 * To complete the batchquickly.16 * To complete batches quickly. 16 17 * To minimize the amount of data transfer to hosts. 17 18 … … 23 24 The ideal policy would start each host at a different point in 24 25 the job space, separated according to their speeds. 25 This would potentially send each file to a single host.26 This would potentially send each file only to a single host. 26 27 However, it's impractical for various reasons: 27 28 replication, unreliability of hosts, and unpredictability of their speed. 28 29 29 30 Instead, we use a policy in which the set of hosts is divided into '''teams''', 30 and each team starts at a different point inthe job space.31 and each team works on a different area of the job space. 31 32 Teams should have these properties: 32 33 … … 36 37 * Subject to the above, teams should as small as possible. 37 38 A good size might be 10 or 20. 38 * The hosts in a team should belong to different users .39 * The hosts in a team should belong to different users (for validation purposes). 39 40 40 41 Because of host churn, team membership is dynamic; … … 44 45 45 46 A '''cursor''' consists of 46 * a dynamicteam of hosts47 * a team of hosts 47 48 * a range of jobs 48 49 * status information (see below) … … 51 52 allowing cursors to move from one job range to another, 52 53 and allowing job ranges to be subdivided. 53 I think this is needless complexity.54 I think this is needlessly complex. 54 55 55 56 === Database === … … 58 59 59 60 {{{ 60 batch_host 61 batch 62 // this table already exists; we may need to add fields to it 63 64 batch_host // batch/host association table 61 65 host_id integer 62 66 batch_id integer … … 64 68 65 69 locality_cursor 70 batch_id integer 66 71 expavg_credit double 67 72 // sum of expavg_credit of hosts in the team … … 71 76 // all jobs before this have been completed 72 77 first_ungenerated_job_num integer 73 // all jobs before this have workunit records 78 // we've generated workunit records for all jobs before this 79 index on (batch_id, expavg_credit) 74 80 75 81 workunit (new fields) 76 82 cursor_id integer 77 83 job_num integer 78 79 result (new fields)80 cursor_id integer81 84 82 85 }}} … … 90 93 create locality_cursor records 91 94 92 === Feeder/scheduler ===95 === Scheduler === 93 96 94 97 ==== Assign host to cursors ==== … … 97 100 98 101 If this is a new host (i.e. no batch_host record) then 99 * assign host to cursor with least expavg_credit102 * assign host to cursor for this batch with least expavg_credit 100 103 * create batch_host record 101 104 * add host's expavg_credit to cursor's expavg_credit … … 104 107 Let C = host's cursor. 105 108 If C.expavg_credit > 2*lowest expavg among cursors, 106 then move this host to th e lowest-expavgcursor.109 then move this host to that cursor. 107 110 (This policy may need to be refined a bit). 108 111 … … 127 130 tell client to delete it. 128 131 132 Note: names of sticky files should encode the batch and file number. 133 129 134 === Work generator === 130 135 136 Loop over batches and cursors. 131 137 Try to maintain a cushion of N unsent jobs per cursor. 132 138 Start generating jobs at cursor.first_ungenerated_job_num.