Opened 17 years ago
Closed 15 years ago
#276 closed Defect (fixed)
Server reissues task more then "total" specified
Reported by: | KSMarksPsych | Owned by: | davea |
---|---|---|---|
Priority: | Major | Milestone: | Undetermined |
Component: | Server - Transitioner | Version: | |
Keywords: | Cc: | Pepo |
Description
From this thread at boinc_dev.
Rosetta is seeing cases where 3 results get sent out for the same WUID, even though the project is set up to have a total of at most 2. This seems to be possible when the first result is returned with errors and the second misses the deadline. In that case, a third result is issued for the WUID and this exceeds the total allowed.
Rosetta thread referenced above.
Change History (6)
comment:1 Changed 17 years ago by
comment:2 Changed 17 years ago by
The configuration field in question is called:
max_total_results ==
I don't believe any other combinations of returned results have demonstraited the problem, always seems to be an error, then a missed deadline, in that order. But that may be coincidence. The net result is that the second user returns the task late and receives credit anyway (due to tollerance on the server), in the meantime the third result is sent out (often within 15min of the deadline expiring) and when it is returned, it receives no credit because the maximum number of valid results have already been received.
comment:3 Changed 17 years ago by
Oh man this bug is long-lived, it's over 3 years since 1st. reported...
There's probably more than one way to fix it, but my guess is atleast one way to fix it would be, in transitioner.C immediately after line# 348: if (n > 0) {
if n > 0 it means transitioner will generate n more 'results'
Add a new check, something like:
if ((n + (int)items.size() ) > wu_item.max_total_results) { guessing (int)item.size() is the current generated results... No point to make more results if this will put you over max_total_results.
log_messages.printf( SCHED_MSG_LOG::NORMAL, "[WU#%d %s] Generating more results for WU will give too many total
results (%d)\n",
wu_item.id, wu_item.name, (int)items.size()
);
wu_item.error_mask |= WU_ERROR_TOO_MANY_TOTAL_RESULTS; }
Fix any other code so you don't generate more results... Trigger Assimilator?
Well, a better method would be to take into account instances there min_quorum <> target_nresults, so in cases there's still "active" results that can lead to validation, the wu isn't aborted too early.
Also, no idea how many bugs in my code-snippet...
comment:4 Changed 16 years ago by
Cc: | Pepo added |
---|
comment:5 Changed 16 years ago by
Owner: | changed from Bruce Allen to davea |
---|
I have noticed this off and on for some time, I keep thinking it is fixed but I guess we are not consistantly using enough tasks to see it. It appears to be a symptom of one of two common logic errors. Not starting the count at the same number (ie. 0 or 1), or checking a condition at the wrong time (ie. send task then check to see if we are over the max).