Opened 17 years ago
Last modified 15 years ago
#512 closed Defect
CPU Time not updated under linux — at Version 3
Reported by: | tstrunk | Owned by: | Bruce Allen |
---|---|---|---|
Priority: | Minor | Milestone: | Undetermined |
Component: | BOINC - API | Version: | |
Keywords: | cpu_time | Cc: |
Description (last modified by )
Since the beginning of november (first noticed there), the cpu time update in the client doesn't work anymore under linux with our app. When downgrading the boinc libs of our client app to for example rev. 13231, it works again.
I added this line to boinc_checkpoint_completed in boinc_api.C:1010
fprintf(stderr,"in Checkpoint complete: cur_cpu = %g, last_wu = %g, last checkp = %g\n",cur_cpu,last_wu_cpu_time,last_checkpoint_cpu_time); (before) update_app_progress(last_checkpoint_cpu_time, last_checkpoint_cpu_time);
and this one to timer_handler: 852
fprintf(stderr,"cur cpu = %g, initial_wu_cpu_time=%g , last_wu = %g , last_checkpoint = %g\n",cur_cpu,initial_wu_cpu_time,last_wu_cpu_time,last_checkpoint_cpu_time); (also before) update_app_progress(last_wu_cpu_time, last_checkpoint_cpu_time);
From this I got the output:
cur cpu = 0, initial_wu_cpu_time=0 , last_wu = 0 , last_checkpoint = 0 in Checkpoint complete: cur_cpu = 58.4437, last_wu = 58.4437, last checkp = 58.4437 cur cpu = 0, initial_wu_cpu_time=0 , last_wu = 0 , last_checkpoint = 58.4437
So for me this sounds like boinc_worker_thread_cpu_time() sometimes works and sometimes doesn't. I think, this is the case, because boinc_checkpoint_completed is called from the real worker thread, while timer_handler is called from "Somewhere Else (TM)"
A slight guess at what could have produced this behaviour is this changeset:
[13880/trunk/boinc/api/boinc_api.C]
Change History (3)
comment:1 Changed 17 years ago by
comment:2 Changed 17 years ago by
I didn't build yet, but I found something interesting here:
http://nptl.bullopensource.org/ml_nptl/nptl-200410/msg00005.html
"With NPTL, the timing is thread-wise only. That is, the timing is given only for the thread that calls getrusage(), resp. times(). This deviates from SUSv3."
Also here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0406.1/0929.html
"getrusage dosn't work (and didn't do so in pre-NPTL-times) as the time spent in threads is not taken into account."
I think with this I found the culprit, that is NPTL - getrusage only gives back the CPU time used by the calling thread. I did a testbuild on my laptop with glibc 2.7 and it still didn't update the cpu time correctly. So basically - this is no BOINC bug anymore and this bug report can be closed, as building with linuxthreads might fix it. A workaround for NPTL would be the behaviour before changeset 13855.
So a bit more information: I build on a i386 machine (debian 4.0)
Two boinc worker processes show up as six different processes each looking like this in ps axjf:
getconf GNU_LIBPTHREAD_VERSION gives: NPTL 2.3.6
And now that I think of it changeset 13855 seems to fit my problem more:
http://boinc.berkeley.edu/trac/changeset/13855/trunk/boinc/api/boinc_api.C
I will now try to build with revisions 13854 and 13855 and see if this causes my problem.