#925 closed Defect (fixed)
BOINC 6.6.36 non-CUDA checkpoint interval scaled by host.ncpus
Reported by: | Thyme Lawn | Owned by: | davea |
---|---|---|---|
Priority: | Minor | Milestone: | Undetermined |
Component: | Client - Daemon | Version: | 6.6.36 |
Keywords: | checkpoint | Cc: |
Description
BOINC 6.6.36 has a checkpoint multiplier problem for non-CUDA tasks on multi-core systems.
I have my checkpoint interval set to 10 minutes but checkpoints for malariacontrol.net and WCG are happening every 20 minutes on a dual core system and every 40 minutes on a quad core. Applications are kept in memory and WCG usually checkpoints soon after being scheduled, meaning it can be scheduled for >80 minutes on the quad core instead of the 60 minutes set in preferences.
The problem is in ACTIVE_TASK::write_app_init_file() which contains the following pair of lines:
int nprocs = (result->avp->ncudas)?coproc_cuda->count:gstate.ncpus; aid.checkpoint_period = nprocs*gstate.global_prefs.disk_interval;
This means the checkpoint interval for non-CUDA tasks will always be scaled up by the host's number of CPUs (gstate.ncpus) instead of the average number of CPU's requested for the task (result->avp->avg_ncpus).
Sure enough, app_init.xml on the dual core system has
<checkpoint_period>1200.000000</checkpoint_period>
and on the quad core it has
<checkpoint_period>2400.000000</checkpoint_period>
Patch