Opened 17 years ago

Closed 17 years ago

#430 closed Defect (fixed)

Not sending work - last request too recent (dont follow deferral-orders)

Reported by: Rattledagger Owned by: davea
Priority: Major Milestone: Undetermined
Component: Client - Daemon Version:
Keywords: Cc:

Description

Client doesn't follow orders of waiting 121 seconds, or 24 hours respectively but re-asks too early than allowed, wasting everyone's time:

06.10.2007 14:38:59|Climateprediction.net Beta|Sending scheduler request: To fetch work
06.10.2007 14:38:59|Climateprediction.net Beta|Requesting 864 seconds of new work
06.10.2007 14:39:04|Climateprediction.net Beta|Scheduler RPC succeeded [server version 509]
06.10.2007 14:39:04|Climateprediction.net Beta|Message from server: platform 'windows_x86_64' not found
06.10.2007 14:39:04|Climateprediction.net Beta|Deferring communication for 1 min 0 sec
06.10.2007 14:39:04|Climateprediction.net Beta|Reason: no work from project
06.10.2007 14:39:09|World Community Grid|Sending scheduler request: To fetch work
06.10.2007 14:39:09|World Community Grid|Requesting 864 seconds of new work
06.10.2007 14:39:14|World Community Grid|Scheduler RPC succeeded [server version 509]
06.10.2007 14:39:14|World Community Grid|Message from server: No work could be sent.
06.10.2007 14:39:14|World Community Grid|Message from server: No work is available for AfricanClimate@Home
06.10.2007 14:39:14|World Community Grid|Deferring communication for 1 min 0 sec
06.10.2007 14:39:14|World Community Grid|Reason: no work from project
06.10.2007 14:40:11|Climateprediction.net Beta|Sending scheduler request: To fetch work
06.10.2007 14:40:11|Climateprediction.net Beta|Requesting 864 seconds of new work
06.10.2007 14:40:16|Climateprediction.net Beta|Scheduler RPC succeeded [server version 509]
06.10.2007 14:40:16|Climateprediction.net Beta|Message from server: platform 'windows_x86_64' not found
06.10.2007 14:40:16|Climateprediction.net Beta|Deferring communication for 1 min 0 sec
06.10.2007 14:40:16|Climateprediction.net Beta|Reason: no work from project
06.10.2007 14:40:21|World Community Grid|Sending scheduler request: To fetch work
06.10.2007 14:40:21|World Community Grid|Requesting 864 seconds of new work
06.10.2007 14:40:26|World Community Grid|Scheduler RPC succeeded [server version 509]
06.10.2007 14:40:26|World Community Grid|Message from server: Not sending work - last request too recent: 73 sec
06.10.2007 14:40:26|World Community Grid|Deferring communication for 1 min 0 sec
06.10.2007 14:40:26|World Community Grid|Reason: no work from project

Attachments (2)

sched_reply_climateapps1.oucs.ox.ac.uk_beta.txt (342 bytes) - added by Rattledagger 17 years ago.
sched_reply_einstein.phys.uwm.edu.xml (4.2 KB) - added by Rattledagger 17 years ago.

Download all attachments as: .zip

Change History (15)

comment:1 Changed 17 years ago by davea

Resolution: invalid
Status: newclosed

I don't see where the 121 secs and 24 hours are in the log.

comment:2 Changed 17 years ago by Nicolas

Shouldn't a "platform not found" wait 24 hours?

comment:3 Changed 17 years ago by Rattledagger

Resolution: invalid
Status: closedreopened

The deferral-message is part of the scheduler-reply:

<scheduler_reply>
  <scheduler_version>509</scheduler_version> 
  <master_url>http://www.worldcommunitygrid.org/</master_url> 
  <request_delay>121.200000</request_delay> 
  <message priority="high">No work could be sent.</message> 
  <message priority="high">No work is available for AfricanClimate@Home</message> 
  <project_name>World Community Grid</project_name> 

(snip)}}}
Followed by:
{{{
<scheduler_reply>
  <scheduler_version>509</scheduler_version> 
  <master_url>http://www.worldcommunitygrid.org/</master_url> 
  <request_delay>121.200000</request_delay> 
  <message priority="low">Not sending work - last request too recent: 67 sec</message> 
  <project_name>World Community Grid</project_name> 
}}}

But, the BOINC-client keeps on re-asking again and again, too early, this from BOINC-client, clearly showing it's using 60 seconds deferral instead of the 121.2 as it should have used:
{{{
10.10.2007 21:05:41|World Community Grid|Sending scheduler request: To fetch work
10.10.2007 21:05:41|World Community Grid|Requesting 1728 seconds of new work
10.10.2007 21:05:48|World Community Grid|Scheduler RPC succeeded [server version 509]
10.10.2007 21:05:48|World Community Grid|Message from server: No work could be sent.
10.10.2007 21:05:48|World Community Grid|Message from server: No work is available for AfricanClimate@Home
10.10.2007 21:05:48|World Community Grid|Deferring communication for 1 min 0 sec
10.10.2007 21:05:48|World Community Grid|Reason: no work from project
10.10.2007 21:06:49|World Community Grid|Sending scheduler request: To fetch work
10.10.2007 21:06:49|World Community Grid|Requesting 1728 seconds of new work
10.10.2007 21:06:54|World Community Grid|Scheduler RPC succeeded [server version 509]
10.10.2007 21:06:54|World Community Grid|Message from server: Not sending work - last request too recent: 67 sec
10.10.2007 21:06:54|World Community Grid|Deferring communication for 1 min 0 sec
10.10.2007 21:06:54|World Community Grid|Reason: no work from project
10.10.2007 21:07:55|World Community Grid|Sending scheduler request: To fetch work
10.10.2007 21:07:55|World Community Grid|Requesting 1728 seconds of new work
10.10.2007 21:08:00|World Community Grid|Scheduler RPC succeeded [server version 509]
10.10.2007 21:08:00|World Community Grid|Message from server: Not sending work - last request too recent: 65 sec
10.10.2007 21:08:00|World Community Grid|Deferring communication for 1 min 0 sec
10.10.2007 21:08:00|World Community Grid|Reason: no work from project
10.10.2007 21:09:01|World Community Grid|Sending scheduler request: To fetch work
10.10.2007 21:09:01|World Community Grid|Requesting 1728 seconds of new work
10.10.2007 21:09:06|World Community Grid|Scheduler RPC succeeded [server version 509]
10.10.2007 21:09:06|World Community Grid|Message from server: Not sending work - last request too recent: 67 sec
10.10.2007 21:09:06|World Community Grid|Deferring communication for 1 min 0 sec
10.10.2007 21:09:06|World Community Grid|Reason: no work from project
}}}

As for CPDN beta, it gives:
{{{
<?xml version="1.0" encoding="ISO-8859-1" ?> 
- <scheduler_reply>
  <scheduler_version>509</scheduler_version> 
  <master_url>http://climateapps1.oucs.ox.ac.uk/beta/</master_url> 
  <request_delay>86400.000000</request_delay> 
  <message priority="low">platform 'windows_x86_64' not found</message> 
  <project_name>Climateprediction.net Beta</project_name>
}}}

But, BOINC-client gladly overlooks this 24-hour deferral, and keeps banging on the scheduling-server:

{{{
10.10.2007 21:13:26|Climateprediction.net Beta|Sending scheduler request: To fetch work
10.10.2007 21:13:26|Climateprediction.net Beta|Requesting 1728 seconds of new work
10.10.2007 21:13:31|Climateprediction.net Beta|Scheduler RPC succeeded [server version 509]
10.10.2007 21:13:31|Climateprediction.net Beta|Message from server: platform 'windows_x86_64' not found
10.10.2007 21:13:31|Climateprediction.net Beta|Deferring communication for 1 min 0 sec
10.10.2007 21:13:31|Climateprediction.net Beta|Reason: no work from project
10.10.2007 21:14:34|Climateprediction.net Beta|Sending scheduler request: To fetch work
10.10.2007 21:14:34|Climateprediction.net Beta|Requesting 1728 seconds of new work
10.10.2007 21:14:39|Climateprediction.net Beta|Scheduler RPC succeeded [server version 509]
10.10.2007 21:14:39|Climateprediction.net Beta|Message from server: platform 'windows_x86_64' not found
10.10.2007 21:14:39|Climateprediction.net Beta|Deferring communication for 1 min 0 sec
10.10.2007 21:14:39|Climateprediction.net Beta|Reason: no work from project
}}}

comment:4 Changed 17 years ago by Nicolas

Because of the '-' you left on one of the <scheduler_reply> tags, I suspect you're opening the sched_reply file with a web-browser that formats the XML. Can you open it with a normal text editor? I want to see where there are newlines.

Changed 17 years ago by Rattledagger

comment:5 Changed 17 years ago by Nicolas

No idea then. But shouldn't there be a </scheduler_reply> at the end? That looks strange...

comment:6 in reply to:  5 Changed 17 years ago by Rattledagger

Replying to Nicolas:

No idea then. But shouldn't there be a </scheduler_reply> at the end? That looks strange...

Each Scheduler-reply is of course much longer, and includes the closing-tag, but didn't expect it was important to include everything. Also, did include "snip" in the 1st. scheduler-reply, even formatting didn't work as expected...

But anyway, just to show yet another instance, this one due to "not enough free disk space", a full Einstein@home scheduler-reply...

Oh, and this last one is with win32 - BOINC v5.10.21, the other logs was with win64.

The BOINC client-log is:

11.10.2007 16:24:02|Einstein@Home|Sending scheduler request: To fetch work
11.10.2007 16:24:02|Einstein@Home|Requesting 1728 seconds of new work
11.10.2007 16:24:13|Einstein@Home|Scheduler RPC succeeded [server version 601]
11.10.2007 16:24:13|Einstein@Home|Message from server: No work sent
11.10.2007 16:24:13|Einstein@Home|Message from server: There was work but you don't have enough disk space allocated.
11.10.2007 16:24:13|Einstein@Home|Message from server: An additional 3635 MB is needed.
11.10.2007 16:24:13|Einstein@Home|Message from server: BOINC will delete file h1_0452.40_S5R2 when no longer needed
11.10.2007 16:24:13|Einstein@Home|Got server request to delete file h1_0452.40_S5R2
11.10.2007 16:24:13|Einstein@Home|Deferring communication for 1 min 0 sec
11.10.2007 16:24:13|Einstein@Home|Reason: no work from project
11.10.2007 16:25:13|Einstein@Home|Sending scheduler request: To fetch work
11.10.2007 16:25:13|Einstein@Home|Requesting 1728 seconds of new work
11.10.2007 16:25:18|Einstein@Home|Scheduler RPC succeeded [server version 601]
11.10.2007 16:25:18|Einstein@Home|Message from server: No work sent
11.10.2007 16:25:18|Einstein@Home|Message from server: There was work but you don't have enough disk space allocated.
11.10.2007 16:25:18|Einstein@Home|Message from server: An additional 3632 MB is needed.
11.10.2007 16:25:18|Einstein@Home|Message from server: BOINC will delete file h1_0452.50_S5R2 when no longer needed
11.10.2007 16:25:18|Einstein@Home|Got server request to delete file h1_0452.50_S5R2
11.10.2007 16:25:18|Einstein@Home|Deferring communication for 1 min 0 sec
11.10.2007 16:25:18|Einstein@Home|Reason: no work from project
11.10.2007 16:26:18|Einstein@Home|Sending scheduler request: To fetch work
11.10.2007 16:26:18|Einstein@Home|Requesting 1728 seconds of new work
11.10.2007 16:26:28|Einstein@Home|Scheduler RPC succeeded [server version 601]
11.10.2007 16:26:28|Einstein@Home|Message from server: No work sent
11.10.2007 16:26:28|Einstein@Home|Message from server: There was work but you don't have enough disk space allocated.
11.10.2007 16:26:28|Einstein@Home|Message from server: An additional 3629 MB is needed.
11.10.2007 16:26:28|Einstein@Home|Message from server: BOINC will delete file l1_0452.55_S5R2 when no longer needed
11.10.2007 16:26:28|Einstein@Home|Got server request to delete file l1_0452.55_S5R2
11.10.2007 16:26:28|Einstein@Home|Deferring communication for 1 min 0 sec
11.10.2007 16:26:28|Einstein@Home|Reason: no work from project

Changed 17 years ago by Rattledagger

comment:7 Changed 17 years ago by Nicolas

I'm now seeing this with LHC. LHC set the deferral to 1 hour, but the client is deferring communication for whatever time it wants to, which gets me a 'last request too recent' message. Also, I think the LHC server is resetting the last request timestamp on every request, so if the client requests twice an hour, it will never succeed.

comment:8 Changed 17 years ago by davea

Resolution: fixed
Status: reopenedclosed

(In [13849]) - client: fix bug where delay request from project is ignored

if no results are returned. Also, made the "Deferred" messages conditional on sched_op_debug Fixes #430 (from Rattledagger)

comment:9 in reply to:  8 ; Changed 17 years ago by Nicolas

Replying to davea:

Also, made the "Deferred" messages conditional on sched_op_debug

So far three out of three people want that reverted, see LHC forum.

comment:10 in reply to:  9 Changed 17 years ago by Nicolas

Resolution: fixed
Status: closedreopened

Replying to Nicolas:

Replying to davea:

Also, made the "Deferred" messages conditional on sched_op_debug

So far three out of three people want that reverted, see LHC forum.

Make that five people against the change. Reopening so this is noticed (there's no notifications on comments).

comment:11 Changed 17 years ago by romw

Resolution: fixed
Status: reopenedclosed

(In [13857]) - client: fix bug where delay request from project is ignored

if no results are returned. Also, made the "Deferred" messages conditional on sched_op_debug Fixes #430 (from Rattledagger)

client/

cs_scheduler.C scheduler_op.C work_fetch.C

comment:12 Changed 17 years ago by Nicolas

Resolution: fixed
Status: closedreopened

Eep... Why don't you use a message like "merged changes into branch"? I reopened this for another reason.

comment:13 Changed 17 years ago by davea

Resolution: fixed
Status: reopenedclosed

I'm going to leave the deferral messages under <sched_op_debug>. They're too low-level to show by default.

Note: See TracTickets for help on using tickets.