Opened 17 years ago

Closed 14 years ago

#347 closed Defect (fixed)

large amount of tasks = boinc and boincmgr hogging all CPU time

Reported by: Aaron Finney Owned by: romw
Priority: Major Milestone: 6.12
Component: Manager Version:
Keywords: patch Cc:

Description

For some reason, the more tasks you have in your queue, the more CPU time that BOINC.exe and boincmgr.exe are needing, I'm assuming to update the grid in boinc manager. At some point, the CPU time required to parse the list of tasks in the queue reaches a 'critical mass' and boinc.exe and boincmgr.exe just use the CPU time up to 100% updating themselves. No scientific work can get through, because boinc.exe and boincmgr.exe have higher priority levels.

Attachments (4)

boinc50%.JPG (181.3 KB) - added by Aaron Finney 17 years ago.
boinc taking up 50% of resource share
boinc&boincmgrcpuhog.JPG (170.3 KB) - added by Aaron Finney 17 years ago.
Boinc manager with list of large tasks taking up 50% of CPU time.
boincwithsmalltasks.JPG (183.9 KB) - added by Aaron Finney 17 years ago.
Task manager after removing 90% of tasks.
boincmgr_347.patch (5.6 KB) - added by Der Meister 16 years ago.
Patch implementing the dynamic refresh rate

Download all attachments as: .zip

Change History (23)

Changed 17 years ago by Aaron Finney

Attachment: boinc50%.JPG added

boinc taking up 50% of resource share

Changed 17 years ago by Aaron Finney

Attachment: boinc&boincmgrcpuhog.JPG added

Boinc manager with list of large tasks taking up 50% of CPU time.

Changed 17 years ago by Aaron Finney

Attachment: boincwithsmalltasks.JPG added

Task manager after removing 90% of tasks.

comment:1 Changed 17 years ago by Aaron Finney

This also can cause BOINC and the BOINC manager to become completely unresponsive.

comment:2 Changed 17 years ago by KSMarksPsych

Component: UndeterminedClient - Daemon
Milestone: Undetermined5.10
Owner: set to davea

comment:3 Changed 17 years ago by Nicolas

Well, that's a huge list of tasks! It would need some profiling to know what piece of code is being the bottleneck.

comment:4 Changed 17 years ago by davea

Owner: changed from davea to romw

comment:5 Changed 17 years ago by Nicolas

An idea:

  • Add a GUI RPC to get details of a single result (or better: a specific list of results).
  • Make the manager update every second only the results in "Running" state, and refresh the whole list using get_results on longer periods, like every 10 seconds.

If a new manager connects to an old client not supporting the RPC, it should notice the RPC is missing and avoid it. After connection, the first time the manager refreshes the result list, try the new RPC. If it fails with "RPC unrecognized", fallback to the current method of getting all results every second, and don't retry the new RPC during the rest of that connection.

comment:6 Changed 17 years ago by Nicolas

An addition to my previous suggestion: because of the way the <get_results> tag is parsed, we don't really need a new RPC. Just more parameters to the existing RPC. It would be even better from a backwards compatibility standpoint. Sending an RPC like the following:

<boinc_gui_rpc_request>
<get_results>
<filter>scheduler_state eq 2</filter>
</get_results>
</boinc_gui_rpc_request>

currently sends back the whole list of results without any error. A future version of the client could actually parse that tag to send only results that match the expression, and a future manager could take advantage of it, without any compat. problem. If the manager sends a filter, but gets the full list anyway, no problem, just update whole list on the GUI as it does now.

Probably such a generic <filter> expression is too complex to implement. That's just an example; any filter syntax would work.

comment:7 Changed 16 years ago by Der Meister

I just did some profiling of the manager. Unfortunately I don't have access to a client with more than 5 results at the moment, therefore I did not profile in the same situation described here.

However, my first results show that a lot of CPU time is used for memory management and string manipulation. Most of this seems to come from internal wxWidgets functions. A particular function of the manager than seems to be expensive is RESULT::parse in gui_rpc_client_ops.C. This would be consistent with the observation described in this ticket. Unfortunately I don't see any easy way to optimise this function, except Nicolas'. But this seems to be a quite big change to the manager and the core client.

I think I will run another profiling session as soon as I am able to persuade my core client to fetch a larger amount of work.

comment:8 Changed 16 years ago by Der Meister

Another solution came to my mind: What about turning the static refresh frequency into a dynamic one? The delay between two refresh-cycles could be proportional to the number of results (or more abstract: proportional to the number of elements in the current view). That should help to keep the CPU usage of the core client and the manager on a constant low level.

If there are no objections against this proposal I would like to write a patch for it.

Changed 16 years ago by Der Meister

Attachment: boincmgr_347.patch added

Patch implementing the dynamic refresh rate

comment:9 Changed 16 years ago by Der Meister

Keywords: patch added

comment:10 Changed 16 years ago by Didactylos

Milestone: 6.46.2

This is a blocker originally for 5.10! There is a patch provided.

Resetting milestone to 6.2. Major bugs should not be version-bumped without explanation.

comment:11 Changed 15 years ago by romw

Milestone: 6.26.6

Bumping to 6.6

comment:12 Changed 15 years ago by romw

Component: Client - DaemonManager
Milestone: 6.66.8

comment:13 Changed 15 years ago by romw

Priority: BlockerMajor

comment:14 Changed 15 years ago by romw

Milestone: 6.86.10

Moving this to 6.10.

The patch cannot be applied as we are now using the Async RPC mechinism.

The decided solution to the problem is too spin up a new thread to monitor the async thread and throttle it back so that it cannot use more than 5% of the CPU. The async thread may need to be broken up into two different threads so that on-demand RPCs are not subject to being throttled.

comment:15 Changed 15 years ago by davea

No more threads.

1) show only active tasks; button to show all

2) reduce refresh rate if CPU usage is above threshold

comment:16 Changed 15 years ago by Nicolas

So it wasn't much of a "decided solution"...

comment:17 Changed 15 years ago by Nicolas

If you plan to add a new RPC to get only active tasks, remember what I mentioned before in this ticket: adding tags inside <get_results> is fully backwards compatible. Older clients will return the full result list with no problems. So don't make a new RPC, use something like:

<get_results><only_active/></get_results>

comment:18 Changed 14 years ago by romw

This issue should be resolved in the latest beta releases.

comment:19 Changed 14 years ago by romw

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.