Opened 16 years ago
Last modified 15 years ago
#868 reopened Defect
Consistant crash using astropulse_v5 version 503 [boinc v6.6.20 x86_64-pc-linux-gnu]
Reported by: | tomchiverton | Owned by: | davea |
---|---|---|---|
Priority: | Minor | Milestone: | Undetermined |
Component: | Client - Daemon | Version: | 6.4.7 |
Keywords: | Cc: |
Description (last modified by )
The same issue occurs with the current stable release too). The timestamp at the end is from my console, so you can see some work is done before the crash.
08-Apr-2009 07:45:05 [---] Preferences limit memory usage when active to 1006.59MB 08-Apr-2009 07:45:05 [---] Preferences limit memory usage when idle to 1811.87MB 08-Apr-2009 07:45:05 [---] Preferences limit disk usage to 47.96GB 08-Apr-2009 07:45:05 [---] Preferences limit # CPUs to 1 08-Apr-2009 07:45:05 [SETI@home] Restarting task ap_09fe09ac_B0_P0_00197_20090318_16242.wu_2 using astropulse_v5 version 503 SIGSEGV: segmentation violation Stack trace (14 frames): ./boinc(boinc_catch_signal+0x43)[0x452f63] /lib64/libpthread.so.0[0x2b8f5fc63fb0] /lib64/libc.so.6[0x2b8f6010547a] /lib64/libc.so.6[0x2b8f6010bb6a] /lib64/libc.so.6(__printf_fp+0xb51)[0x2b8f6010c9c1] /lib64/libc.so.6(_IO_vfprintf+0x4f8)[0x2b8f60107588] /lib64/libc.so.6(vsprintf+0x79)[0x2b8f60126a39] /lib64/libc.so.6(sprintf+0x88)[0x2b8f60110638] ./boinc[0x40e3ac] ./boinc[0x40b69d] ./boinc[0x416b48] ./boinc[0x43e8f6] /lib64/libc.so.6(__libc_start_main+0xf4)[0x2b8f600e2b54] ./boinc(__gxx_personality_v0+0x1c9)[0x405bb9] Exiting... Wed Apr 8 08:07:39 BST 2009
Change History (13)
comment:1 Changed 16 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 16 years ago by
Nothing else on the box is upset. I've now built the client from source, including the required libcurl update, and copied that binary over the top of the current released one. *So far* it's stayed up and running.
comment:3 Changed 16 years ago by
no such luck even with the latest build: Not running, starting: boinc: no process killed 08-Apr-2009 15:34:07 [---] Starting BOINC client version 6.7.4 for x86_64-pc-linux-gnu 08-Apr-2009 15:34:07 [---] This a development version of BOINC and may not function properly 08-Apr-2009 15:34:07 [---] log flags: task, file_xfer, sched_ops 08-Apr-2009 15:34:07 [---] Libraries: libcurl/7.19.4 OpenSSL/0.9.8e zlib/1.2.3 libidn/1.0 08-Apr-2009 15:34:07 [---] Data directory: /home/chivertont/bin/boinc 08-Apr-2009 15:34:07 [---] Processor: 2 GenuineIntel? Intel(R) Pentium(R) D CPU 3.00GHz [Family 15 Model 4 Stepping 7] 08-Apr-2009 15:34:07 [---] Processor: 1.00 MB cache 08-Apr-2009 15:34:07 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm 08-Apr-2009 15:34:07 [---] OS: Linux: 2.6.22.19-0.2-default 08-Apr-2009 15:34:07 [---] Memory: 1.97 GB physical, 1.00 GB virtual 08-Apr-2009 15:34:07 [---] Disk: 148.00 GB total, 48.71 GB free 08-Apr-2009 15:34:07 [---] Local time is UTC +1 hours 08-Apr-2009 15:34:07 [---] No CUDA devices found 08-Apr-2009 15:34:07 [---] No coprocessors 08-Apr-2009 15:34:07 [---] Not using a proxy 08-Apr-2009 15:34:08 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4454414; location: work; project prefs: default 08-Apr-2009 15:34:08 [SETI@home] General prefs: from SETI@home (last modified 02-Oct-2008 09:23:33) 08-Apr-2009 15:34:08 [SETI@home] Computer location: work 08-Apr-2009 15:34:08 [SETI@home] General prefs: no separate prefs for work; using your defaults 08-Apr-2009 15:34:08 [---] Preferences limit memory usage when active to 1006.59MB 08-Apr-2009 15:34:08 [---] Preferences limit memory usage when idle to 1811.87MB 08-Apr-2009 15:34:08 [---] Preferences limit disk usage to 47.97GB 08-Apr-2009 15:34:08 [---] Preferences limit # CPUs to 1 08-Apr-2009 15:34:08 [SETI@home] Restarting task ap_09fe09ac_B0_P0_00197_20090318_16242.wu_2 using astropulse_v5 version 503 SIGSEGV: segmentation violation Stack trace (14 frames): ./boinc(boinc_catch_signal+0x43)[0x451b23] /lib64/libpthread.so.0[0x2b6432eb8fb0] /lib64/libc.so.6[0x2b643387047a] /lib64/libc.so.6[0x2b6433876b6a] /lib64/libc.so.6(printf_fp+0xb51)[0x2b64338779c1] /lib64/libc.so.6(_IO_vfprintf+0x4f8)[0x2b6433872588] /lib64/libc.so.6(vsprintf+0x79)[0x2b6433891a39] /lib64/libc.so.6(sprintf+0x88)[0x2b643387b638] ./boinc[0x40dea9] ./boinc[0x40b2ad] ./boinc[0x416358] ./boinc[0x43db86] /lib64/libc.so.6(libc_start_main+0xf4)[0x2b643384db54] ./boinc(gxx_personality_v0+0x1e9)[0x4058b9]
Exiting... Wed Apr 8 16:15:49 BST 2009
comment:4 Changed 16 years ago by
Component: | Undetermined → Client - Daemon |
---|---|
Owner: | set to davea |
Priority: | Undetermined → Minor |
Your stacktrace lacks debugging symbols, making it of little use.
Run gdb boinc
, then in the (gdb)
prompt type run
. Wait for it to crash, then type bt
and post the resulting backtrace here.
comment:6 Changed 16 years ago by
Here we go:
15-Apr-2009 20:06:37 [---] Processor: 1.00 MB cache 15-Apr-2009 20:06:37 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm 15-Apr-2009 20:06:37 [---] OS: Linux: 2.6.22.19-0.2-default 15-Apr-2009 20:06:37 [---] Memory: 1.97 GB physical, 1.00 GB virtual 15-Apr-2009 20:06:37 [---] Disk: 148.00 GB total, 50.86 GB free 15-Apr-2009 20:06:37 [---] Local time is UTC +1 hours 15-Apr-2009 20:06:37 [---] No CUDA devices found 15-Apr-2009 20:06:37 [---] No coprocessors 15-Apr-2009 20:06:37 [---] Not using a proxy 15-Apr-2009 20:06:37 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4454414; location: work; project prefs: default 15-Apr-2009 20:06:37 [SETI@home] General prefs: from SETI@home (last modified 02-Oct-2008 09:23:33) 15-Apr-2009 20:06:37 [SETI@home] Computer location: work 15-Apr-2009 20:06:37 [SETI@home] General prefs: no separate prefs for work; using your defaults 15-Apr-2009 20:06:37 [---] Preferences limit memory usage when active to 1006.59MB 15-Apr-2009 20:06:37 [---] Preferences limit memory usage when idle to 1811.87MB 15-Apr-2009 20:06:37 [---] Preferences limit disk usage to 50.89GB 15-Apr-2009 20:06:37 [---] Preferences limit # CPUs to 1 15-Apr-2009 20:06:37 [SETI@home] Restarting task ap_01fe09ac_B5_P1_00186_20090404_06169.wu_2 using astropulse_v5 version 503 Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x2b911e2bc1b0 (LWP 13557)] 0x00002b911dfb647a in ?? () from /lib64/libc.so.6 (gdb) (gdb) bt #0 0x00002b911dfb647a in ?? () from /lib64/libc.so.6 #1 0x00002b911dfbcb6a in ?? () from /lib64/libc.so.6 #2 0x00002b911dfbd9c1 in __printf_fp () from /lib64/libc.so.6 #3 0x00002b911dfb8588 in vfprintf () from /lib64/libc.so.6 #4 0x00002b911dfd7a39 in vsprintf () from /lib64/libc.so.6 #5 0x00002b911dfc1638 in sprintf () from /lib64/libc.so.6 #6 0x000000000040dea9 in ACTIVE_TASK_SET::send_heartbeats (this=0x680da0) at app_control.cpp:420 #7 0x000000000040b2ad in ACTIVE_TASK_SET::poll (this=0x680da0) at app.cpp:383 #8 0x0000000000416358 in CLIENT_STATE::poll_slow_events (this=0x680ce0) at client_state.cpp:630 #9 0x000000000043db86 in boinc_main_loop () at main.cpp:557 #10 0x00002b911df93b54 in __libc_start_main () from /lib64/libc.so.6 #11 0x00000000004058b9 in _start ()
comment:7 Changed 16 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
I fixed this in [17831]. The problem was that there's a printf using %f into a char[256], and the number (a working-set size) must have been something huge like 1e304.
I didn't figure out why the WSS was huge. This must be something specific to Linux64. I'll look into this, but if anyone wants to run with <mem_usage_debug> with Linux64, let me know if you see anything suspicious.
comment:8 Changed 16 years ago by
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Doesn't seem to have helped:
[New Thread 0x2b68ea1f71b0 (LWP 10054)] 16-Apr-2009 19:56:26 [---] Starting BOINC client version 6.7.4 for x86_64-pc-linux-gnu 16-Apr-2009 19:56:26 [---] This a development version of BOINC and may not function properly 16-Apr-2009 19:56:26 [---] log flags: task, file_xfer, sched_ops 16-Apr-2009 19:56:26 [---] Libraries: libcurl/7.19.4 OpenSSL/0.9.8e zlib/1.2.3 libidn/1.0 16-Apr-2009 19:56:26 [---] Data directory: /home/chivertont/bin/boinc 16-Apr-2009 19:56:26 [---] Processor: 2 GenuineIntel Intel(R) Pentium(R) D CPU 3.00GHz [Family 15 Model 4 Stepping 7] 16-Apr-2009 19:56:26 [---] Processor: 1.00 MB cache 16-Apr-2009 19:56:26 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm 16-Apr-2009 19:56:26 [---] OS: Linux: 2.6.22.19-0.2-default 16-Apr-2009 19:56:26 [---] Memory: 1.97 GB physical, 1.00 GB virtual 16-Apr-2009 19:56:26 [---] Disk: 148.00 GB total, 50.86 GB free 16-Apr-2009 19:56:26 [---] Local time is UTC +1 hours 16-Apr-2009 19:56:26 [---] No CUDA devices found 16-Apr-2009 19:56:26 [---] No coprocessors 16-Apr-2009 19:56:26 [---] Not using a proxy 16-Apr-2009 19:56:26 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4454414; location: work; project prefs: default 16-Apr-2009 19:56:26 [SETI@home] General prefs: from SETI@home (last modified 02-Oct-2008 09:23:33) 16-Apr-2009 19:56:26 [SETI@home] Computer location: work 16-Apr-2009 19:56:26 [SETI@home] General prefs: no separate prefs for work; using your defaults 16-Apr-2009 19:56:26 [---] Preferences limit memory usage when active to 1006.59MB 16-Apr-2009 19:56:26 [---] Preferences limit memory usage when idle to 1811.87MB 16-Apr-2009 19:56:26 [---] Preferences limit disk usage to 50.89GB 16-Apr-2009 19:56:26 [---] Preferences limit # CPUs to 1 16-Apr-2009 19:56:26 [SETI@home] Restarting task ap_01fe09ac_B5_P1_00186_20090404_06169.wu_2 using astropulse_v5 version 503 16-Apr-2009 19:57:01 [SETI@home] Task ap_01fe09ac_B5_P1_00186_20090404_06169.wu_2 exited with zero status but no 'finished' file 16-Apr-2009 19:57:01 [SETI@home] If this happens repeatedly you may need to reset the project. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x2b68ea1f71b0 (LWP 10054)] 0x00002b68e9ef147a in ?? () from /lib64/libc.so.6 (gdb) (gdb) bt #0 0x00002b68e9ef147a in ?? () from /lib64/libc.so.6 #1 0x00002b68e9ef7b6a in ?? () from /lib64/libc.so.6 #2 0x00002b68e9ef8a6a in __printf_fp () from /lib64/libc.so.6 #3 0x00002b68e9ef3588 in vfprintf () from /lib64/libc.so.6 #4 0x00002b68e9f12a39 in vsprintf () from /lib64/libc.so.6 #5 0x00002b68e9efc638 in sprintf () from /lib64/libc.so.6 #6 0x000000000040df19 in ACTIVE_TASK_SET::send_heartbeats (this=0x680da0) at app_control.cpp:420 #7 0x000000000040b2bd in ACTIVE_TASK_SET::poll (this=0x680da0) at app.cpp:383 #8 0x0000000000416578 in CLIENT_STATE::poll_slow_events (this=0x680ce0) at client_state.cpp:630 #9 0x000000000043e0d6 in boinc_main_loop () at main.cpp:557 #10 0x00002b68e9eceb54 in __libc_start_main () from /lib64/libc.so.6 #11 0x00000000004058b9 in _start () (gdb)
comment:9 Changed 16 years ago by
Hmm. When it crashes, can you please type
frame 6 p atp->procinfo.working_set_size p ar
and post the results? thanks.
comment:10 Changed 16 years ago by
That line got reformatted... David wrote:
please type
frame 6 p atp->procinfo.working_set_size p ar
comment:11 Changed 16 years ago by
Hmm, the crash appears to be later:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x2ba55a00b9c0 (LWP 30567)] 0x00002ba559d0547a in ?? () from /lib64/libc.so.6 (gdb) bt #0 0x00002ba559d0547a in ?? () from /lib64/libc.so.6 #1 0x00002ba559d05112 in ?? () from /lib64/libc.so.6 #2 0x00002ba559cfd63d in ?? () from /lib64/libc.so.6 #3 0x000000000040e288 in ACTIVE_TASK::get_app_status_msg (this=0x6ecfd0) at /usr/include/stdlib.h:330 #4 0x000000000040e5e8 in ACTIVE_TASK_SET::get_msgs (this=0x680da0) at app_control.cpp:1021 #5 0x000000000040b2f3 in ACTIVE_TASK_SET::poll (this=0x680da0) at app.cpp:389 #6 0x00000000004165a8 in CLIENT_STATE::poll_slow_events (this=0x680ce0) at client_state.cpp:630 #7 0x000000000043e106 in boinc_main_loop () at main.cpp:557 #8 0x00002ba559ce2b54 in __libc_start_main () from /lib64/libc.so.6 #9 0x00000000004058b9 in _start ()
17-Apr-2009 09:57:56 [SETI@home] Restarting task ap_01fe09ac_B5_P1_00186_20090404_06169.wu_2 using astropulse_v5 version 503 Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x2ba55a00b9c0 (LWP 30567)] 0x00002ba559d0547a in ?? () from /lib64/libc.so.6 (gdb) bt #0 0x00002ba559d0547a in ?? () from /lib64/libc.so.6 #1 0x00002ba559d05112 in ?? () from /lib64/libc.so.6 #2 0x00002ba559cfd63d in ?? () from /lib64/libc.so.6 #3 0x000000000040e288 in ACTIVE_TASK::get_app_status_msg (this=0x6ecfd0) at /usr/include/stdlib.h:330 #4 0x000000000040e5e8 in ACTIVE_TASK_SET::get_msgs (this=0x680da0) at app_control.cpp:1021 #5 0x000000000040b2f3 in ACTIVE_TASK_SET::poll (this=0x680da0) at app.cpp:389 #6 0x00000000004165a8 in CLIENT_STATE::poll_slow_events (this=0x680ce0) at client_state.cpp:630 #7 0x000000000043e106 in boinc_main_loop () at main.cpp:557 #8 0x00002ba559ce2b54 in __libc_start_main () from /lib64/libc.so.6 #9 0x00000000004058b9 in _start () (gdb) frame 4 #4 0x000000000040e5e8 in ACTIVE_TASK_SET::get_msgs (this=0x680da0) at app_control.cpp:1021 1021 if (atp->get_app_status_msg()) { (gdb) p atp->get_app_status_msg() $1 = true (gdb) print-object atp->get_app_status_msg() Cannot access memory at address 0x0 (gdb) (gdb)
comment:12 Changed 16 years ago by
OK, I got a dump from the orig. crash point:
17-Apr-2009 11:04:59 [SETI@home] Restarting task ap_01fe09ac_B5_P1_00186_20090404_06169.wu_2 using astropulse_v5 version 503 ^[[C Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x2b643b7639c0 (LWP 32687)] 0x00002b643b45d47a in ?? () from /lib64/libc.so.6 (gdb) (gdb) bt #0 0x00002b643b45d47a in ?? () from /lib64/libc.so.6 #1 0x00002b643b463b6a in ?? () from /lib64/libc.so.6 #2 0x00002b643b464a6a in __printf_fp () from /lib64/libc.so.6 #3 0x00002b643b45f588 in vfprintf () from /lib64/libc.so.6 #4 0x00002b643b484e7a in vsnprintf () from /lib64/libc.so.6 #5 0x00002b643b4685a3 in snprintf () from /lib64/libc.so.6 #6 0x000000000040df1e in ACTIVE_TASK_SET::send_heartbeats (this=0x680da0) at app_control.cpp:420 #7 0x000000000040b2bd in ACTIVE_TASK_SET::poll (this=0x680da0) at app.cpp:383 #8 0x00000000004165a8 in CLIENT_STATE::poll_slow_events (this=0x680ce0) at client_state.cpp:630 #9 0x000000000043e106 in boinc_main_loop () at main.cpp:557 #10 0x00002b643b43ab54 in __libc_start_main () from /lib64/libc.so.6 #11 0x00000000004058b9 in _start () (gdb) frame 6 #6 0x000000000040df1e in ACTIVE_TASK_SET::send_heartbeats (this=0x680da0) at app_control.cpp:420 420 ); (gdb) p atp->procinfo.working_set_size $1 = 48103424 (gdb) p ar $2 = 1055490048 (gdb)
comment:13 Changed 15 years ago by
I can reduce the incident of the bug by setting BOINC not to keep applications in memory when suspended.
Fixed formatting.
But I am not sure if this needs to be said here. Have you looked in the Seti Linux Q&A forum for help? It looks to me like you may have a memory or page file problem.