Opened 16 years ago

Last modified 15 years ago

#512 closed Defect

CPU Time not updated under linux — at Version 3

Reported by: tstrunk Owned by: Bruce Allen
Priority: Minor Milestone: Undetermined
Component: BOINC - API Version:
Keywords: cpu_time Cc:

Description (last modified by Nicolas)

Since the beginning of november (first noticed there), the cpu time update in the client doesn't work anymore under linux with our app. When downgrading the boinc libs of our client app to for example rev. 13231, it works again.

I added this line to boinc_checkpoint_completed in boinc_api.C:1010

fprintf(stderr,"in Checkpoint complete: cur_cpu = %g, last_wu = %g, last checkp = %g\n",cur_cpu,last_wu_cpu_time,last_checkpoint_cpu_time);
(before) update_app_progress(last_checkpoint_cpu_time, last_checkpoint_cpu_time);

and this one to timer_handler: 852

fprintf(stderr,"cur cpu = %g, initial_wu_cpu_time=%g , last_wu = %g , last_checkpoint = %g\n",cur_cpu,initial_wu_cpu_time,last_wu_cpu_time,last_checkpoint_cpu_time);
(also before) update_app_progress(last_wu_cpu_time, last_checkpoint_cpu_time);

From this I got the output:

cur cpu = 0, initial_wu_cpu_time=0 , last_wu = 0 , last_checkpoint = 0
in Checkpoint complete: cur_cpu = 58.4437, last_wu = 58.4437, last checkp = 58.4437
cur cpu = 0, initial_wu_cpu_time=0 , last_wu = 0 , last_checkpoint = 58.4437

So for me this sounds like boinc_worker_thread_cpu_time() sometimes works and sometimes doesn't. I think, this is the case, because boinc_checkpoint_completed is called from the real worker thread, while timer_handler is called from "Somewhere Else (TM)"

A slight guess at what could have produced this behaviour is this changeset:
[13880/trunk/boinc/api/boinc_api.C]

Change History (3)

comment:1 Changed 16 years ago by tstrunk

So a bit more information: I build on a i386 machine (debian 4.0)
Two boinc worker processes show up as six different processes each looking like this in ps axjf:

3194 3195 3193 2291 pts/3 3193 RN+ 1005 0:37 | \_ poem_0.8_i686-pc-linux-gnu
3195 3199 3193 2291 pts/3 3193 SN+ 1005 0:00 | | \_ poem_0.8_i686-pc-linux-gnu
3199 3200 3193 2291 pts/3 3193 SN+ 1005 0:00 | | \_ poem_0.8_i686-pc-linux-gnu

getconf GNU_LIBPTHREAD_VERSION gives: NPTL 2.3.6

And now that I think of it changeset 13855 seems to fit my problem more:
http://boinc.berkeley.edu/trac/changeset/13855/trunk/boinc/api/boinc_api.C

I will now try to build with revisions 13854 and 13855 and see if this causes my problem.

comment:2 Changed 16 years ago by tstrunk

I didn't build yet, but I found something interesting here:

http://nptl.bullopensource.org/ml_nptl/nptl-200410/msg00005.html

"With NPTL, the timing is thread-wise only. That is, the timing is given only for the thread that calls getrusage(), resp. times(). This deviates from SUSv3."

Also here:

http://www.ussg.iu.edu/hypermail/linux/kernel/0406.1/0929.html

"getrusage dosn't work (and didn't do so in pre-NPTL-times) as the time spent in threads is not taken into account."

I think with this I found the culprit, that is NPTL - getrusage only gives back the CPU time used by the calling thread. I did a testbuild on my laptop with glibc 2.7 and it still didn't update the cpu time correctly. So basically - this is no BOINC bug anymore and this bug report can be closed, as building with linuxthreads might fix it. A workaround for NPTL would be the behaviour before changeset 13855.

comment:3 Changed 16 years ago by Nicolas

Description: modified (diff)

Fix some formatting.

Note: See TracTickets for help on using tickets.