Opened 14 years ago
#1048 new Defect
Race condition when suspending tasks
Reported by: | Martin Suchan | Owned by: | davea |
---|---|---|---|
Priority: | Minor | Milestone: | Undetermined |
Component: | Client - Scheduler Policy | Version: | 6.12.16 |
Keywords: | Cc: |
Description
I've just noticed this issue when suspending tasks manually in BOINC Manager - situation:
Win7 x86, BM 6.12.15, only WCG project, Core2Duo - 2 cores I got about 10 downloaded tasks, one is completed and reported, other two are running, the other tasks are not started yet, but allowed to be started once other task is finished.
I selected all not-started tasks PLUS one running task and clicked the Suspend button in the left command bar.
I expected, that all task will be marked at once as Suspended and the running will stop as well.
What actually happened? One not-yet-started task was started for about 1 second and after then it was suspended. I guess the task of "changing status to suspended" is not done in transactional way. What actually happened, my guess, - some function got list of tasks to suspend, it started suspending one task each time. First it suspended the one running task. In this moment some other thread noticed there is one free slot for running, it found ready task and started it (typical race condition ), in the meantime the first thread finished suspending the other tasks, including the one started by the other thread.
This should be fixed in my opinion. It could lead to bigger problems when running on 8+ core systems with lot of projects.
Event log:
task faah19421_ZINC17130909_xmdEq_1TW7_02_0 is running task HFCC_L4_01202033_L4_0001_0 is in group for suspending, but it is started for 1 second
9.3.2011 9:50:46 | | Suspending computation - user request 9.3.2011 9:50:50 | | Resuming computation 9.3.2011 9:50:54 | World Community Grid | task faah19421_ZINC17130909_xmdEq_1TW7_02_0 suspended by user 9.3.2011 9:50:55 | World Community Grid | task oe781_00061_9 suspended by user 9.3.2011 9:50:55 | World Community Grid | task X0000065610008200603171636_1 suspended by user 9.3.2011 9:50:55 | World Community Grid | task X0000065621388200603241639_0 suspended by user 9.3.2011 9:50:55 | World Community Grid | Starting HFCC_L4_01202033_L4_0001_0 9.3.2011 9:50:55 | World Community Grid | Starting task HFCC_L4_01202033_L4_0001_0 using hfcc version 640 9.3.2011 9:50:56 | World Community Grid | task HFCC_L4_01202033_L4_0001_0 suspended by user 9.3.2011 9:50:56 | World Community Grid | task X0000065671034200603171856_1 suspended by user