Opened 15 years ago

Closed 15 years ago

Last modified 15 years ago

#925 closed Defect (fixed)

BOINC 6.6.36 non-CUDA checkpoint interval scaled by host.ncpus

Reported by: Thyme Lawn Owned by: davea
Priority: Minor Milestone: Undetermined
Component: Client - Daemon Version: 6.6.36
Keywords: checkpoint Cc:

Description

BOINC 6.6.36 has a checkpoint multiplier problem for non-CUDA tasks on multi-core systems.

I have my checkpoint interval set to 10 minutes but checkpoints for malariacontrol.net and WCG are happening every 20 minutes on a dual core system and every 40 minutes on a quad core. Applications are kept in memory and WCG usually checkpoints soon after being scheduled, meaning it can be scheduled for >80 minutes on the quad core instead of the 60 minutes set in preferences.

The problem is in ACTIVE_TASK::write_app_init_file() which contains the following pair of lines:

int nprocs = (result->avp->ncudas)?coproc_cuda->count:gstate.ncpus; aid.checkpoint_period = nprocs*gstate.global_prefs.disk_interval;

This means the checkpoint interval for non-CUDA tasks will always be scaled up by the host's number of CPUs (gstate.ncpus) instead of the average number of CPU's requested for the task (result->avp->avg_ncpus).

Sure enough, app_init.xml on the dual core system has

<checkpoint_period>1200.000000</checkpoint_period>

and on the quad core it has

<checkpoint_period>2400.000000</checkpoint_period>

Attachments (1)

app_start_cpp.patch (637 bytes) - added by Thyme Lawn 15 years ago.
Patch

Download all attachments as: .zip

Change History (3)

Changed 15 years ago by Thyme Lawn

Attachment: app_start_cpp.patch added

Patch

comment:1 Changed 15 years ago by romw

Resolution: fixed
Status: newclosed

Now fixed in 6.10.

comment:2 Changed 15 years ago by Nicolas

Fixed in r19293, backported to 6.10 in r19321.

Note: See TracTickets for help on using tickets.