Context Navigation

Changes between Version 82 and Version 83 of ProjectOptions

Timestamp:: Feb 19, 2009, 3:08:16 PM (17 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ProjectOptions

-                      v82
+                      v83
 The size of the feeder's enumeration query.  Default is 200.
-{{{
-<reliable_max_avg_turnaround>secs</reliable_max_avg_turnaround>
-<reliable_max_error_rate>X</reliable_max_error_rate>
-}}}
-Hosts whose average turnaround is at most reliable_max_avg_turnaround
-and whose error rate is at most reliable_max_error_rate
-are considered 'reliable'.
-{{{
-<reliable_reduced_delay_bound>X</reliable_reduced_delay_bound>
-}}}
-When a result is sent to a reliable host, multiply the delay bound by reliable_reduced_delay_bound (typically 0.5 or so).
-{{{
-<reliable_on_priority>X</reliable_on_priority>
-<reliable_priority_on_over>X</reliable_priority_on_over>
-<reliable_priority_on_over_except_error>X</reliable_priority_on_over_except_error>
-}}}
-Results with priority at least '''reliable_on_priority''' will be sent only to reliable hosts.
-Increase priority of duplicate results by '''reliable_priority_on_over''';
-increase priority of duplicates caused by timeout (not error) by '''reliable_priority_on_over_except_error'''.
 == Scheduling: matchmaker scheduling ==
 …
 to maintain statistics on the distribution of host speeds.
+== Scheduling: accelerating retries ==
+The goal of this mechanism (which works with job-cache and matchmaker scheduling,
+but not locality scheduling) is to send timeout-generated retries to
+hosts that are likely to finish them fast.
+Here's how it works:
+ * Hosts are deemed "reliable" (a slight misnomer) if they satisfy turnaround time and error rate criteria.
+ * A job instance is deemed "need-reliable" if its priority is above a threshold.
+ * The scheduler tries to send need-reliable jobs to reliable hosts.  When it does, it reduces the delay bound of the job.
+ * When job replicas are created in response to errors or timeouts, their priority is raised relative to the job's base priority.
+The configurable parameters are:
+{{{
+<reliable_on_priority>X</reliable_on_priority>
+}}}
+Results with priority at least '''reliable_on_priority''' are treated as "need-reliable".
+With matchmaker scheduling, they'll be sent preferentially to reliable hosts;
+with job-cache scheduling, they'll be sent ONLY to reliable hosts.
+{{{
+<reliable_max_avg_turnaround>secs</reliable_max_avg_turnaround>
+<reliable_max_error_rate>X</reliable_max_error_rate>
+}}}
+Hosts whose average turnaround is at most reliable_max_avg_turnaround
+and whose error rate is at most reliable_max_error_rate are considered 'reliable'.
+Make sure you set these low enough that a significant fraction (e.g. 25%) of your hosts qualify.
+{{{
+<reliable_reduced_delay_bound>X</reliable_reduced_delay_bound>
+}}}
+When a need-reliable result is sent to a reliable host,
+multiply the delay bound by '''reliable_reduced_delay_bound''' (typically 0.5 or so).
+{{{
+<reliable_priority_on_over>X</reliable_priority_on_over>
+<reliable_priority_on_over_except_error>X</reliable_priority_on_over_except_error>
+}}}
+If '''reliable_priority_on_over''' is nonzero,
+increase the priority of duplicate jobs by that amount over the job's base priority.
+Otherwise, if '''reliable_priority_on_over_except_error''' is nonzero,
+increase the priority of duplicates caused by timeout (not error) by that amount.
+(Typically only one of these is nonzero, and is equal to '''reliable_on_priority'''.)
+NOTE: this mechanism can be used to preferentially send ANY job,
+not just retries, to fast/reliable hosts.
+To do so, set the workunit's priority to '''reliable_on_priority''' or greater.
 == Scheduling: locality scheduling ==
 {{{
 …
 <ignore_upload_certificates/>
 }}}
 If upload certificates are not generated, this option must be enabled to force file upload handler accept files being uploaded.
+If upload certificates are not generated, this option must be enabled to force the file upload handler to accept files.
 == Default preferences ==