Opened 17 years ago
Closed 17 years ago
#430 closed Defect (fixed)
Not sending work - last request too recent (dont follow deferral-orders)
Reported by: | Rattledagger | Owned by: | davea |
---|---|---|---|
Priority: | Major | Milestone: | Undetermined |
Component: | Client - Daemon | Version: | |
Keywords: | Cc: |
Description
Client doesn't follow orders of waiting 121 seconds, or 24 hours respectively but re-asks too early than allowed, wasting everyone's time:
06.10.2007 14:38:59|Climateprediction.net Beta|Sending scheduler request: To fetch work 06.10.2007 14:38:59|Climateprediction.net Beta|Requesting 864 seconds of new work 06.10.2007 14:39:04|Climateprediction.net Beta|Scheduler RPC succeeded [server version 509] 06.10.2007 14:39:04|Climateprediction.net Beta|Message from server: platform 'windows_x86_64' not found 06.10.2007 14:39:04|Climateprediction.net Beta|Deferring communication for 1 min 0 sec 06.10.2007 14:39:04|Climateprediction.net Beta|Reason: no work from project 06.10.2007 14:39:09|World Community Grid|Sending scheduler request: To fetch work 06.10.2007 14:39:09|World Community Grid|Requesting 864 seconds of new work 06.10.2007 14:39:14|World Community Grid|Scheduler RPC succeeded [server version 509] 06.10.2007 14:39:14|World Community Grid|Message from server: No work could be sent. 06.10.2007 14:39:14|World Community Grid|Message from server: No work is available for AfricanClimate@Home 06.10.2007 14:39:14|World Community Grid|Deferring communication for 1 min 0 sec 06.10.2007 14:39:14|World Community Grid|Reason: no work from project 06.10.2007 14:40:11|Climateprediction.net Beta|Sending scheduler request: To fetch work 06.10.2007 14:40:11|Climateprediction.net Beta|Requesting 864 seconds of new work 06.10.2007 14:40:16|Climateprediction.net Beta|Scheduler RPC succeeded [server version 509] 06.10.2007 14:40:16|Climateprediction.net Beta|Message from server: platform 'windows_x86_64' not found 06.10.2007 14:40:16|Climateprediction.net Beta|Deferring communication for 1 min 0 sec 06.10.2007 14:40:16|Climateprediction.net Beta|Reason: no work from project 06.10.2007 14:40:21|World Community Grid|Sending scheduler request: To fetch work 06.10.2007 14:40:21|World Community Grid|Requesting 864 seconds of new work 06.10.2007 14:40:26|World Community Grid|Scheduler RPC succeeded [server version 509] 06.10.2007 14:40:26|World Community Grid|Message from server: Not sending work - last request too recent: 73 sec 06.10.2007 14:40:26|World Community Grid|Deferring communication for 1 min 0 sec 06.10.2007 14:40:26|World Community Grid|Reason: no work from project
Attachments (2)
Change History (15)
comment:1 Changed 17 years ago by
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:3 Changed 17 years ago by
Resolution: | invalid |
---|---|
Status: | closed → reopened |
The deferral-message is part of the scheduler-reply:
<scheduler_reply> <scheduler_version>509</scheduler_version> <master_url>http://www.worldcommunitygrid.org/</master_url> <request_delay>121.200000</request_delay> <message priority="high">No work could be sent.</message> <message priority="high">No work is available for AfricanClimate@Home</message> <project_name>World Community Grid</project_name> (snip)}}} Followed by: {{{ <scheduler_reply> <scheduler_version>509</scheduler_version> <master_url>http://www.worldcommunitygrid.org/</master_url> <request_delay>121.200000</request_delay> <message priority="low">Not sending work - last request too recent: 67 sec</message> <project_name>World Community Grid</project_name> }}} But, the BOINC-client keeps on re-asking again and again, too early, this from BOINC-client, clearly showing it's using 60 seconds deferral instead of the 121.2 as it should have used: {{{ 10.10.2007 21:05:41|World Community Grid|Sending scheduler request: To fetch work 10.10.2007 21:05:41|World Community Grid|Requesting 1728 seconds of new work 10.10.2007 21:05:48|World Community Grid|Scheduler RPC succeeded [server version 509] 10.10.2007 21:05:48|World Community Grid|Message from server: No work could be sent. 10.10.2007 21:05:48|World Community Grid|Message from server: No work is available for AfricanClimate@Home 10.10.2007 21:05:48|World Community Grid|Deferring communication for 1 min 0 sec 10.10.2007 21:05:48|World Community Grid|Reason: no work from project 10.10.2007 21:06:49|World Community Grid|Sending scheduler request: To fetch work 10.10.2007 21:06:49|World Community Grid|Requesting 1728 seconds of new work 10.10.2007 21:06:54|World Community Grid|Scheduler RPC succeeded [server version 509] 10.10.2007 21:06:54|World Community Grid|Message from server: Not sending work - last request too recent: 67 sec 10.10.2007 21:06:54|World Community Grid|Deferring communication for 1 min 0 sec 10.10.2007 21:06:54|World Community Grid|Reason: no work from project 10.10.2007 21:07:55|World Community Grid|Sending scheduler request: To fetch work 10.10.2007 21:07:55|World Community Grid|Requesting 1728 seconds of new work 10.10.2007 21:08:00|World Community Grid|Scheduler RPC succeeded [server version 509] 10.10.2007 21:08:00|World Community Grid|Message from server: Not sending work - last request too recent: 65 sec 10.10.2007 21:08:00|World Community Grid|Deferring communication for 1 min 0 sec 10.10.2007 21:08:00|World Community Grid|Reason: no work from project 10.10.2007 21:09:01|World Community Grid|Sending scheduler request: To fetch work 10.10.2007 21:09:01|World Community Grid|Requesting 1728 seconds of new work 10.10.2007 21:09:06|World Community Grid|Scheduler RPC succeeded [server version 509] 10.10.2007 21:09:06|World Community Grid|Message from server: Not sending work - last request too recent: 67 sec 10.10.2007 21:09:06|World Community Grid|Deferring communication for 1 min 0 sec 10.10.2007 21:09:06|World Community Grid|Reason: no work from project }}} As for CPDN beta, it gives: {{{ <?xml version="1.0" encoding="ISO-8859-1" ?> - <scheduler_reply> <scheduler_version>509</scheduler_version> <master_url>http://climateapps1.oucs.ox.ac.uk/beta/</master_url> <request_delay>86400.000000</request_delay> <message priority="low">platform 'windows_x86_64' not found</message> <project_name>Climateprediction.net Beta</project_name> }}} But, BOINC-client gladly overlooks this 24-hour deferral, and keeps banging on the scheduling-server: {{{ 10.10.2007 21:13:26|Climateprediction.net Beta|Sending scheduler request: To fetch work 10.10.2007 21:13:26|Climateprediction.net Beta|Requesting 1728 seconds of new work 10.10.2007 21:13:31|Climateprediction.net Beta|Scheduler RPC succeeded [server version 509] 10.10.2007 21:13:31|Climateprediction.net Beta|Message from server: platform 'windows_x86_64' not found 10.10.2007 21:13:31|Climateprediction.net Beta|Deferring communication for 1 min 0 sec 10.10.2007 21:13:31|Climateprediction.net Beta|Reason: no work from project 10.10.2007 21:14:34|Climateprediction.net Beta|Sending scheduler request: To fetch work 10.10.2007 21:14:34|Climateprediction.net Beta|Requesting 1728 seconds of new work 10.10.2007 21:14:39|Climateprediction.net Beta|Scheduler RPC succeeded [server version 509] 10.10.2007 21:14:39|Climateprediction.net Beta|Message from server: platform 'windows_x86_64' not found 10.10.2007 21:14:39|Climateprediction.net Beta|Deferring communication for 1 min 0 sec 10.10.2007 21:14:39|Climateprediction.net Beta|Reason: no work from project }}}
comment:4 Changed 17 years ago by
Because of the '-' you left on one of the <scheduler_reply>
tags, I suspect you're opening the sched_reply file with a web-browser that formats the XML. Can you open it with a normal text editor? I want to see where there are newlines.
Changed 17 years ago by
Attachment: | sched_reply_climateapps1.oucs.ox.ac.uk_beta.txt added |
---|
comment:5 follow-up: 6 Changed 17 years ago by
No idea then. But shouldn't there be a </scheduler_reply>
at the end? That looks strange...
comment:6 Changed 17 years ago by
Replying to Nicolas:
No idea then. But shouldn't there be a
</scheduler_reply>
at the end? That looks strange...
Each Scheduler-reply is of course much longer, and includes the closing-tag, but didn't expect it was important to include everything. Also, did include "snip" in the 1st. scheduler-reply, even formatting didn't work as expected...
But anyway, just to show yet another instance, this one due to "not enough free disk space", a full Einstein@home scheduler-reply...
Oh, and this last one is with win32 - BOINC v5.10.21, the other logs was with win64.
The BOINC client-log is:
11.10.2007 16:24:02|Einstein@Home|Sending scheduler request: To fetch work 11.10.2007 16:24:02|Einstein@Home|Requesting 1728 seconds of new work 11.10.2007 16:24:13|Einstein@Home|Scheduler RPC succeeded [server version 601] 11.10.2007 16:24:13|Einstein@Home|Message from server: No work sent 11.10.2007 16:24:13|Einstein@Home|Message from server: There was work but you don't have enough disk space allocated. 11.10.2007 16:24:13|Einstein@Home|Message from server: An additional 3635 MB is needed. 11.10.2007 16:24:13|Einstein@Home|Message from server: BOINC will delete file h1_0452.40_S5R2 when no longer needed 11.10.2007 16:24:13|Einstein@Home|Got server request to delete file h1_0452.40_S5R2 11.10.2007 16:24:13|Einstein@Home|Deferring communication for 1 min 0 sec 11.10.2007 16:24:13|Einstein@Home|Reason: no work from project 11.10.2007 16:25:13|Einstein@Home|Sending scheduler request: To fetch work 11.10.2007 16:25:13|Einstein@Home|Requesting 1728 seconds of new work 11.10.2007 16:25:18|Einstein@Home|Scheduler RPC succeeded [server version 601] 11.10.2007 16:25:18|Einstein@Home|Message from server: No work sent 11.10.2007 16:25:18|Einstein@Home|Message from server: There was work but you don't have enough disk space allocated. 11.10.2007 16:25:18|Einstein@Home|Message from server: An additional 3632 MB is needed. 11.10.2007 16:25:18|Einstein@Home|Message from server: BOINC will delete file h1_0452.50_S5R2 when no longer needed 11.10.2007 16:25:18|Einstein@Home|Got server request to delete file h1_0452.50_S5R2 11.10.2007 16:25:18|Einstein@Home|Deferring communication for 1 min 0 sec 11.10.2007 16:25:18|Einstein@Home|Reason: no work from project 11.10.2007 16:26:18|Einstein@Home|Sending scheduler request: To fetch work 11.10.2007 16:26:18|Einstein@Home|Requesting 1728 seconds of new work 11.10.2007 16:26:28|Einstein@Home|Scheduler RPC succeeded [server version 601] 11.10.2007 16:26:28|Einstein@Home|Message from server: No work sent 11.10.2007 16:26:28|Einstein@Home|Message from server: There was work but you don't have enough disk space allocated. 11.10.2007 16:26:28|Einstein@Home|Message from server: An additional 3629 MB is needed. 11.10.2007 16:26:28|Einstein@Home|Message from server: BOINC will delete file l1_0452.55_S5R2 when no longer needed 11.10.2007 16:26:28|Einstein@Home|Got server request to delete file l1_0452.55_S5R2 11.10.2007 16:26:28|Einstein@Home|Deferring communication for 1 min 0 sec 11.10.2007 16:26:28|Einstein@Home|Reason: no work from project
Changed 17 years ago by
Attachment: | sched_reply_einstein.phys.uwm.edu.xml added |
---|
comment:7 Changed 17 years ago by
I'm now seeing this with LHC. LHC set the deferral to 1 hour, but the client is deferring communication for whatever time it wants to, which gets me a 'last request too recent' message. Also, I think the LHC server is resetting the last request timestamp on every request, so if the client requests twice an hour, it will never succeed.
comment:8 follow-up: 9 Changed 17 years ago by
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
comment:9 follow-up: 10 Changed 17 years ago by
Replying to davea:
Also, made the "Deferred" messages conditional on sched_op_debug
So far three out of three people want that reverted, see LHC forum.
comment:10 Changed 17 years ago by
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Replying to Nicolas:
Replying to davea:
Also, made the "Deferred" messages conditional on sched_op_debug
So far three out of three people want that reverted, see LHC forum.
Make that five people against the change. Reopening so this is noticed (there's no notifications on comments).
comment:11 Changed 17 years ago by
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
comment:12 Changed 17 years ago by
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Eep... Why don't you use a message like "merged changes into branch"? I reopened this for another reason.
comment:13 Changed 17 years ago by
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
I'm going to leave the deferral messages under <sched_op_debug>. They're too low-level to show by default.
I don't see where the 121 secs and 24 hours are in the log.