Version 4 (modified by 17 years ago) (diff) | ,
---|
API for multi-thread apps
(The following is a design document, not implemented yet.)
Why write a multi-threaded app?
The average number of cores per PC will increase over the next few years, possibly at a faster rate than the average amount of available RAM.
Depending on your application and project, it may be desirable to develop a multi-threaded application. Possible reasons to do this:
- If your application's memory footprint is large enough that, on some PCs, there's not enough RAM to run a separate copy of the app on each CPU.
- If you want to reduce the turnaround time of your jobs (either because of human factors, or to reduce server occupancy).
Writing and debugging a multi-threaded app is often hard. You may be able to use existing libraries of numerical "kernels" that are already multi-threaded.
Assumptions
A 'multi-thread app' A uses multiple threads, say Nthreads(A). The average number of processors used, Ncpus(A), may be less (because of I/O or synchronization).
Ideally, on a host with N CPUs, we want Ncpus(A), summed over running apps, to be about N. If it's less, we're not using CPU time. If it's more:
- we increase latency without increasing throughput
- we use more RAM than needed
- higher synchronization overhead
We assume that applications may be able to change Nthreads(A) dynamically in response to hints from BOINC. Nthreads(A) need not be equal to the hint.
Example: suppose
- we have an 80-core CPU
- app A can use 1,2,4,8,16,32 threads
- app B can use 1,2,4,8,16,32,64 threads
Then we want to have either (16,64) or (32,32) threads most of the time.
Proposal
API functions:
int boinc_target_nthreads(); void boinc_actual_nthreads(int);
An application calls boinc_target_nthreads() periodically, at points where it is able to change its number of threads. It calls boinc_actual_nthreads() to report its actual number of threads.
A WU DB record can specify "max average ncpus", an estimate of Ncpus(A) on a host with arbitrarily many CPUs. This is used by the client and scheduler to estimate completion time.
Implementation
Shared-memory messages:
- core->app (process control channel): <target_nthreads>
- app->core (process control channel): <actual_nthreads>
Client maintains estimates of CPU effiency per job, uses this to scale target_nthreads.
Implementation (enforce_schedule()): as we schedule jobs, decrement CPU count by scaled actual_nthreads. rr_simulation() needs to be modified too.