Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of LocalityNew

Timestamp:: Aug 14, 2012, 1:54:01 PM (13 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

LocalityNew

                       v1
+= Locality scheduling redesign =
+== Assumptions ==
+ * A '''batch''' consists of a sequence of files F1 ... Fn
+        and a set of jobs J1 ... Jm operating on these files.
+ * Each job uses a contiguous set of files.
+ * A given file may be used by many jobs.
+ * The density of jobs in the file sequence may be variable.
+ * Several batches may be in progress concurrently.
+== Goals ==
+ * To complete the batch quickly.
+ * To minimize the amount of data transfer to hosts.
+== Scheduling policy ==
+The policy in which we dispatch the jobs in order essentially sends
+every file to every host, so it fails to achieve the 2nd goal.
+The ideal policy would start each host at a different point in
+the job space, separated according to their speeds.
+This would potentially send each file to a single host.
+However, it's impractical for various reasons:
+replication, unreliability of hosts, and unpredictability of their speed.
+Instead, we use a policy in which the set of hosts is divided into '''teams''',
+and each team starts at a different point in the job space.
+Teams should have these properties:
+ * The fastest host in a team should be no faster than the
+   total of the other hosts.
+   Otherwise, it could get unboundedly far ahead of the others.
+ * Subject to the above, teams should as small as possible.
+   A good size might be 10 or 20.
+ * The hosts in a team should belong to different users.
+Because of host churn, team membership is dynamic;
+e.g. a team may be empty for a period.
+== Design ==
+A '''cursor''' consists of
+ * a dynamic team of hosts
+ * a range of jobs
+ * status information (see below)
+Note: we discussed having a separate notion of "job range",
+allowing cursors to move from one job range to another,
+and allowing job ranges to be subdivided.
+I think this is needless complexity.
+=== Database ===
+New tables:
+{{{
+batch_host
+        host_id integer
+        batch_id integer
+        cursor_id integer
+locality_cursor
+        expavg_credit double
+                // sum of expavg_credit of hosts in the team
+        first_job_num integer
+        last_job_num integer
+        first_unfinished_job_num integer
+                // all jobs before this have been completed
+        first_ungenerated_job_num integer
+                // all jobs before this have workunit records
+workunit (new fields)
+        cursor_id integer
+result (new fields)
+        cursor_id integer
+}}}
+=== Initialization ===
+To initialize a batch:
+ * create batch record
+ * based on # of hosts, # of jobs, and job density,
+   create locality_job_range and locality_cursor records
+=== Feeder/scheduler ===
+==== Assign host to cursors ====
+For each batch:
+If this is a new host (i.e. no batch_host record) then
+ * assign host to cursor with least expavg_credit
+ * create batch_host record
+ * add host's expavg_credit to cursor's expavg_credit
+Otherwise, consider moving this host to a different cursor.
+Let C = host's cursor.
+If C.expavg_credit > 2*lowest expavg among cursors,
+then move this host to the lowest-expavg cursor.
+(This policy may need to be refined a bit).
+==== Assigning jobs ====
+==== Deleting files ====
+=== Work generator ===
+=== Validator ===
+=== Transitioner ===
+=== locality_daemon ===