Opened 17 years ago

Last modified 15 years ago

#266 closed Task

Handling of multithreaded work units — at Version 1

Reported by: MikeMarsUK Owned by:
Priority: Major Milestone: Undetermined
Component: BOINC - API Version:
Keywords: multithreaded 64bit Cc: Pepo

Description (last modified by Nicolas)

Hope this isn't a duplicate:

Carl is working on a 64-bit climate model (HadGEM) which uses 4.7GB of RAM, and would be multithreaded. Probably will not be live for at least a year.

Boinc will need to be able to cope with huge multithreaded work units in the future (in version 7.0 or 8.0 I guess).

Quote:

sending out multicore-ready models (via MPI) is something I've been toying with, esp for the higher-res models. although I'm not sure how BOINC will react to that, I suppose it couldn't stop our monitor job from launching a 2 or 4-CPU-usage model, but it would probably schedule other work to be done on these CPUs so we'd all be "fighting" for the same cores etc.

http://boinc.berkeley.edu/dev/forum_thread.php?id=1863&nowrap=true#10839

The following is just speculation since I'm not familiar with the internals of Boinc, so please forgive me if I go off on an irrelevant tangent:

I think there is a strong push towards multithreaded software at the moment, and it would have significant advantages for something which uses a huge amount of RAM.

A giant work unit such as being discussed here would likely run on a PC with many cores (server or very-high-end enthusiast's machine), but a max of 1 or 2GB RAM per core (since core-count is rising quicker than memory). If it was running with a single thread, it would be RAM-bound and most of the CPUs would be idle. On the other hand, if it were allowed to use all available cores for one work unit then it'd run much quicker (i.e., on an 8GB box with 4 cores, you could run one model using one core, and have smaller jobs on the other cores, or run the same model using all the cores in ~1/4 the time).

Several ways to handle multithreaded jobs come to mind, all with different problems...

  • No limit on threads, exclusive access to the machine for it's timeslot (and debt would be charged per physical core available). But if the work unit's thread usage fluctuates over time, that'd lead to wasted CPU time.
  • or reserve a number of cores but allow other jobs to run at the same time on the remaining cores (e.g., 1 job exclusively using 2 threads, + 2 jobs using 1 thread each on a 4 core box)
  • or let multiple work units fight for CPU time (i.e., allow more threads of work than there are physical cores, and let the operating system sort it all out. 4 jobs using 4 cores, any number of active threads). Risk of CPU starvation?

The work-request process might need to consider memory size (i.e., don't simultaneously issue 4 5GB work units to a PC with 8GB and 4 cores since only one can practically run at a time).

The round-robin simulator would also need to take this into account. In addition it'd need to be able to simulate a task taking up many cores rather than assuming each task is one core.

The daemon would need to only start sufficient tasks to use the PC's core count (e.g., 2 tasks with one thread, 1 with 2 threads). On the other hand, if a work unit is not keeping all it's threads busy 100% of the time, perhaps a few extra running tasks would be OK as long as CPU starvation doesn't occur.

Change History (1)

comment:1 Changed 17 years ago by Nicolas

Description: modified (diff)
Keywords: workunit added; work units removed
Type: EnhancementTask

Here's another advantage of multithreaded apps: Imagine an app that uses 300MB of RAM. With current BOINC, somebody with 8 cores/CPUs (two quad-cores?) would run 8 instances of the app (different workunits) in order to take advantage of all the computing power. That means BOINC apps in total would use 2.3GB of RAM! If, instead, that app was multithreaded, it could be a single instance using, say, 350MB of RAM and with 8 threads. That is, doing a single workunit 8 times as fast instead of 8 workunits at the same time*. Keeps all the hungry CPUs happy, with 1/8 of the needed RAM. The final throughput (workunits per day) would be around the same.

  • actually, I have seen a SETI user who had just installed BOINC for the first time (on a dual-core) who was surprised it was "two WUs at a time" instead of "single multithreaded WU twice as fast" - but I managed to explain him the reasons.

This feature would be definitely required when Intel releases that 80-core processor commercially (although many years till that happens, if it happens). The question is "How much RAM would you need to run eighty workunits at the same time?" I think with mostly any current BOINC project, the amount of memory needed would reach "insane" values...

There could be issues with credits using multithreaded apps, though... Can't think exactly what, but I'm sure there's something :)

(PS: I fixed the bulleted list on the description)

Note: See TracTickets for help on using tickets.