Opened 17 years ago
Closed 12 years ago
#588 closed Defect (fixed)
indefinite suspension of computing when changing system clock
Reported by: | Richard Haselgrove | Owned by: | davea |
---|---|---|---|
Priority: | Minor | Milestone: | Undetermined |
Component: | Client - Daemon | Version: | 5.10.45 |
Keywords: | benchmark clock | Cc: | costamagnagianfranco@… |
Description
If you make a user error with the system clock in Windows XP, you can cause BOINC to stop processing indefinitely (or for a longer period than I have patience to wait).
To verify:
Set system clock 1 month forward. Note that BOINC immediately runs a benchmark.
Set system clock 1 month back (i.e. to correct time). Wait until next checkpoint for the current app. BOINC suspends computation for a benchmark, but according to <benchmark_debug> doesn't actually start running the benchmark code.
Full message-log posted at Benchmarking bug - indefinite suspension of computing
Change History (7)
comment:1 follow-up: 3 Changed 17 years ago by
Keywords: | clock added |
---|---|
Summary: | Benchmarking - indefinite suspension of computing → indefinite suspension of computing when changing system clock |
comment:2 Changed 17 years ago by
Milestone: | 5.10 → Undetermined |
---|
I think there are three ways to mitigate this:
- Check for time conflicts during every server interaction. This would at least log a relevant message.
- Check every time against the current time looking for time-travel errors.
- Subscribe to time-change events from the operating system.
Sadly, there is no quick fix. None of these methods (and really we need all of them, not just one) are particularly simple to implement.
comment:3 Changed 17 years ago by
Replying to Nicolas:
Wow, I really thought there was a ticket for this already.
Well, I searched both trac and the message boards before posting, and I couldn't find it.
Many problems appear when the system clock is changed. Most impossible to solve, or so hard it's not worth it.
For example, if you have your clock 1 month forward than the correct date, and contact a scheduler, the deferral time is stored as an absolute timestamp: when to contact the server again. If you then take your clock 1 month back (ie. to correct time), communication with that project will be deferred for a month and a bit.
I agree there are lots of problems, but this particular one seems to cause significant loss of scientific work (by halting computing) at one specific and clearly-defined point: the two or three seconds between
Running CPU benchmarks
and
[benchmark_debug] Starting floating-point benchmark
That would seem to be worth solving on its own, and shouldn't be to difficult to track down what it's waiting for.
comment:4 Changed 17 years ago by
I think I've got it:
File: cs_benchmark.C Routine: cpu_benchmarks_poll Line 309:
static double last_time = 0;
If benchmarks have been run in the future (as envisioned by changeset [12128], lines 247-248), this static variable will be pre-initialised to some time in the indefinite future. The test at line 312 will always be satisfied, and the application hangs, by indefinite looping.
Solution: discard variable last_time (or set it to zero) at all possible valid exit points from the benchmarking process.
comment:5 Changed 13 years ago by
The Linux kernel recently grew an interface for apps to be notified of clock changes.
comment:6 Changed 12 years ago by
Cc: | costamagnagianfranco@… added |
---|
Wow, I really thought there was a ticket for this already.
Many problems appear when the system clock is changed. Most impossible to solve, or so hard it's not worth it.
For example, if you have your clock 1 month forward than the correct date, and contact a scheduler, the deferral time is stored as an absolute timestamp: when to contact the server again. If you then take your clock 1 month back (ie. to correct time), communication with that project will be deferred for a month and a bit.