Opened 17 years ago
Last modified 15 years ago
#266 closed Task
Handling of multithreaded work units — at Initial Version
Reported by: | MikeMarsUK | Owned by: | |
---|---|---|---|
Priority: | Major | Milestone: | Undetermined |
Component: | BOINC - API | Version: | |
Keywords: | multithreaded 64bit | Cc: | Pepo |
Description
Hope this isn't a duplicate:
Carl is working on a 64-bit climate model (HadGEM) which uses 4.7GB of RAM, and would be multithreaded. Probably will not be live for at least a year.
Boinc will need to be able to cope with huge multithreaded work units in the future (in version 7.0 or 8.0 I guess).
Quote :
sending out multicore-ready models (via MPI) is something I've been toying with, esp for the higher-res models. although I'm not sure how BOINC will react to that, I suppose it couldn't stop our monitor job from launching a 2 or 4-CPU-usage model, but it would probably schedule other work to be done on these CPUs so we'd all be "fighting" for the same cores etc.
http://boinc.berkeley.edu/dev/forum_thread.php?id=1863&nowrap=true#10839
The following is just speculation since I'm not familiar with the internals of Boinc, so please forgive me if I go off on an irrelevant tangent:
I think there is a strong push towards multithreaded software at the moment, and it would have significant advantages for something which uses a huge amount of RAM.
A giant work unit such as being discussed here would likely run on a PC with many cores (server or very-high-end enthusiast's machine), but a max of 1 or 2GB RAM per core (since core-count is rising quicker than memory). If it was running with a single thread, it would be RAM-bound and most of the CPUs would be idle. On the other hand, if it were allowed to use all available cores for one work unit then it'd run much quicker (i.e., on an 8GB box with 4 cores, you could run one model using one core, and have smaller jobs on the other cores, or run the same model using all the cores in ~1/4 the time).
Several ways to handle multithreaded jobs come to mind, all with different problems...
- No limit on threads, exclusive access to the machine for it's timeslot (and debt would be charged per physical core available). But if the work unit's thread usage fluctuates over time, that'd lead to wasted CPU time.
- or reserve a number of cores but allow other jobs to run at the same time on the remaining cores (e.g., 1 job exclusively using 2 threads, + 2 jobs using 1 thread each on a 4 core box)
- or let multiple work units fight for CPU time (i.e., allow more threads of work than there are physical cores, and let the operating system sort it all out. 4 jobs using 4 cores, any number of active threads). Risk of CPU starvation?
The work-request process might need to consider memory size (i.e., don't simultaneously issue 4 5GB work units to a PC with 8GB and 4 cores since only one can practically run at a time).
The round-robin simulator would also need to take this into account. In addition it'd need to be able to simulate a task taking up many cores rather than assuming each task is one core.
The daemon would need to only start sufficient tasks to use the PC's core count (e.g., 2 tasks with one thread, 1 with 2 threads). On the other hand, if a work unit is not keeping all it's threads busy 100% of the time, perhaps a few extra running tasks would be OK as long as CPU starvation doesn't occur.