Opened 15 years ago

Last modified 15 years ago

#868 reopened Defect

Consistant crash using astropulse_v5 version 503 [boinc v6.6.20 x86_64-pc-linux-gnu]

Reported by: tomchiverton Owned by: davea
Priority: Minor Milestone: Undetermined
Component: Client - Daemon Version: 6.4.7
Keywords: Cc:

Description (last modified by Ageless)

The same issue occurs with the current stable release too). The timestamp at the end is from my console, so you can see some work is done before the crash.

08-Apr-2009 07:45:05 [---] Preferences limit memory usage when active to 1006.59MB
08-Apr-2009 07:45:05 [---] Preferences limit memory usage when idle to 1811.87MB
08-Apr-2009 07:45:05 [---] Preferences limit disk usage to 47.96GB
08-Apr-2009 07:45:05 [---] Preferences limit # CPUs to 1
08-Apr-2009 07:45:05 [SETI@home] Restarting task ap_09fe09ac_B0_P0_00197_20090318_16242.wu_2 using astropulse_v5 version 503
SIGSEGV: segmentation violation
Stack trace (14 frames):
./boinc(boinc_catch_signal+0x43)[0x452f63]
/lib64/libpthread.so.0[0x2b8f5fc63fb0]
/lib64/libc.so.6[0x2b8f6010547a]
/lib64/libc.so.6[0x2b8f6010bb6a]
/lib64/libc.so.6(__printf_fp+0xb51)[0x2b8f6010c9c1]
/lib64/libc.so.6(_IO_vfprintf+0x4f8)[0x2b8f60107588]
/lib64/libc.so.6(vsprintf+0x79)[0x2b8f60126a39]
/lib64/libc.so.6(sprintf+0x88)[0x2b8f60110638]
./boinc[0x40e3ac]
./boinc[0x40b69d]
./boinc[0x416b48]
./boinc[0x43e8f6]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2b8f600e2b54]
./boinc(__gxx_personality_v0+0x1c9)[0x405bb9]

Exiting...
Wed Apr  8 08:07:39 BST 2009

Change History (13)

comment:1 Changed 15 years ago by Ageless

Description: modified (diff)

Fixed formatting.

But I am not sure if this needs to be said here. Have you looked in the Seti Linux Q&A forum for help? It looks to me like you may have a memory or page file problem.

comment:2 Changed 15 years ago by tomchiverton

Nothing else on the box is upset. I've now built the client from source, including the required libcurl update, and copied that binary over the top of the current released one. *So far* it's stayed up and running.

comment:3 Changed 15 years ago by tomchiverton

no such luck even with the latest build: Not running, starting: boinc: no process killed 08-Apr-2009 15:34:07 [---] Starting BOINC client version 6.7.4 for x86_64-pc-linux-gnu 08-Apr-2009 15:34:07 [---] This a development version of BOINC and may not function properly 08-Apr-2009 15:34:07 [---] log flags: task, file_xfer, sched_ops 08-Apr-2009 15:34:07 [---] Libraries: libcurl/7.19.4 OpenSSL/0.9.8e zlib/1.2.3 libidn/1.0 08-Apr-2009 15:34:07 [---] Data directory: /home/chivertont/bin/boinc 08-Apr-2009 15:34:07 [---] Processor: 2 GenuineIntel? Intel(R) Pentium(R) D CPU 3.00GHz [Family 15 Model 4 Stepping 7] 08-Apr-2009 15:34:07 [---] Processor: 1.00 MB cache 08-Apr-2009 15:34:07 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm 08-Apr-2009 15:34:07 [---] OS: Linux: 2.6.22.19-0.2-default 08-Apr-2009 15:34:07 [---] Memory: 1.97 GB physical, 1.00 GB virtual 08-Apr-2009 15:34:07 [---] Disk: 148.00 GB total, 48.71 GB free 08-Apr-2009 15:34:07 [---] Local time is UTC +1 hours 08-Apr-2009 15:34:07 [---] No CUDA devices found 08-Apr-2009 15:34:07 [---] No coprocessors 08-Apr-2009 15:34:07 [---] Not using a proxy 08-Apr-2009 15:34:08 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4454414; location: work; project prefs: default 08-Apr-2009 15:34:08 [SETI@home] General prefs: from SETI@home (last modified 02-Oct-2008 09:23:33) 08-Apr-2009 15:34:08 [SETI@home] Computer location: work 08-Apr-2009 15:34:08 [SETI@home] General prefs: no separate prefs for work; using your defaults 08-Apr-2009 15:34:08 [---] Preferences limit memory usage when active to 1006.59MB 08-Apr-2009 15:34:08 [---] Preferences limit memory usage when idle to 1811.87MB 08-Apr-2009 15:34:08 [---] Preferences limit disk usage to 47.97GB 08-Apr-2009 15:34:08 [---] Preferences limit # CPUs to 1 08-Apr-2009 15:34:08 [SETI@home] Restarting task ap_09fe09ac_B0_P0_00197_20090318_16242.wu_2 using astropulse_v5 version 503 SIGSEGV: segmentation violation Stack trace (14 frames): ./boinc(boinc_catch_signal+0x43)[0x451b23] /lib64/libpthread.so.0[0x2b6432eb8fb0] /lib64/libc.so.6[0x2b643387047a] /lib64/libc.so.6[0x2b6433876b6a] /lib64/libc.so.6(printf_fp+0xb51)[0x2b64338779c1] /lib64/libc.so.6(_IO_vfprintf+0x4f8)[0x2b6433872588] /lib64/libc.so.6(vsprintf+0x79)[0x2b6433891a39] /lib64/libc.so.6(sprintf+0x88)[0x2b643387b638] ./boinc[0x40dea9] ./boinc[0x40b2ad] ./boinc[0x416358] ./boinc[0x43db86] /lib64/libc.so.6(libc_start_main+0xf4)[0x2b643384db54] ./boinc(gxx_personality_v0+0x1e9)[0x4058b9]

Exiting... Wed Apr 8 16:15:49 BST 2009

comment:4 Changed 15 years ago by Nicolas

Component: UndeterminedClient - Daemon
Owner: set to davea
Priority: UndeterminedMinor

Your stacktrace lacks debugging symbols, making it of little use.

Run gdb boinc, then in the (gdb) prompt type run. Wait for it to crash, then type bt and post the resulting backtrace here.

comment:5 Changed 15 years ago by tomchiverton

NaP, running now :-)

comment:6 Changed 15 years ago by tomchiverton

Here we go:

15-Apr-2009 20:06:37 [---] Processor: 1.00 MB cache
15-Apr-2009 20:06:37 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm
15-Apr-2009 20:06:37 [---] OS: Linux: 2.6.22.19-0.2-default
15-Apr-2009 20:06:37 [---] Memory: 1.97 GB physical, 1.00 GB virtual
15-Apr-2009 20:06:37 [---] Disk: 148.00 GB total, 50.86 GB free
15-Apr-2009 20:06:37 [---] Local time is UTC +1 hours
15-Apr-2009 20:06:37 [---] No CUDA devices found
15-Apr-2009 20:06:37 [---] No coprocessors
15-Apr-2009 20:06:37 [---] Not using a proxy
15-Apr-2009 20:06:37 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4454414; location: work; project prefs: default
15-Apr-2009 20:06:37 [SETI@home] General prefs: from SETI@home (last modified 02-Oct-2008 09:23:33)
15-Apr-2009 20:06:37 [SETI@home] Computer location: work
15-Apr-2009 20:06:37 [SETI@home] General prefs: no separate prefs for work; using your defaults
15-Apr-2009 20:06:37 [---] Preferences limit memory usage when active to 1006.59MB
15-Apr-2009 20:06:37 [---] Preferences limit memory usage when idle to 1811.87MB
15-Apr-2009 20:06:37 [---] Preferences limit disk usage to 50.89GB
15-Apr-2009 20:06:37 [---] Preferences limit # CPUs to 1
15-Apr-2009 20:06:37 [SETI@home] Restarting task ap_01fe09ac_B5_P1_00186_20090404_06169.wu_2 using astropulse_v5 version 503


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2b911e2bc1b0 (LWP 13557)]
0x00002b911dfb647a in ?? () from /lib64/libc.so.6
(gdb)
(gdb) bt
#0  0x00002b911dfb647a in ?? () from /lib64/libc.so.6
#1  0x00002b911dfbcb6a in ?? () from /lib64/libc.so.6
#2  0x00002b911dfbd9c1 in __printf_fp () from /lib64/libc.so.6
#3  0x00002b911dfb8588 in vfprintf () from /lib64/libc.so.6
#4  0x00002b911dfd7a39 in vsprintf () from /lib64/libc.so.6
#5  0x00002b911dfc1638 in sprintf () from /lib64/libc.so.6
#6  0x000000000040dea9 in ACTIVE_TASK_SET::send_heartbeats (this=0x680da0)
    at app_control.cpp:420
#7  0x000000000040b2ad in ACTIVE_TASK_SET::poll (this=0x680da0) at app.cpp:383
#8  0x0000000000416358 in CLIENT_STATE::poll_slow_events (this=0x680ce0)
    at client_state.cpp:630
#9  0x000000000043db86 in boinc_main_loop () at main.cpp:557
#10 0x00002b911df93b54 in __libc_start_main () from /lib64/libc.so.6
#11 0x00000000004058b9 in _start ()

comment:7 Changed 15 years ago by davea

Resolution: fixed
Status: newclosed

I fixed this in [17831]. The problem was that there's a printf using %f into a char[256], and the number (a working-set size) must have been something huge like 1e304.

I didn't figure out why the WSS was huge. This must be something specific to Linux64. I'll look into this, but if anyone wants to run with <mem_usage_debug> with Linux64, let me know if you see anything suspicious.

comment:8 Changed 15 years ago by tomchiverton

Resolution: fixed
Status: closedreopened

Doesn't seem to have helped:


[New Thread 0x2b68ea1f71b0 (LWP 10054)]

16-Apr-2009 19:56:26 [---] Starting BOINC client version 6.7.4 for x86_64-pc-linux-gnu

16-Apr-2009 19:56:26 [---] This a development version of BOINC and may not function properly

16-Apr-2009 19:56:26 [---] log flags: task, file_xfer, sched_ops

16-Apr-2009 19:56:26 [---] Libraries: libcurl/7.19.4 OpenSSL/0.9.8e zlib/1.2.3 libidn/1.0

16-Apr-2009 19:56:26 [---] Data directory: /home/chivertont/bin/boinc

16-Apr-2009 19:56:26 [---] Processor: 2 GenuineIntel               Intel(R) Pentium(R) D CPU 3.00GHz [Family 15 Model 4 Stepping 7]

16-Apr-2009 19:56:26 [---] Processor: 1.00 MB cache

16-Apr-2009 19:56:26 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm

16-Apr-2009 19:56:26 [---] OS: Linux: 2.6.22.19-0.2-default

16-Apr-2009 19:56:26 [---] Memory: 1.97 GB physical, 1.00 GB virtual

16-Apr-2009 19:56:26 [---] Disk: 148.00 GB total, 50.86 GB free

16-Apr-2009 19:56:26 [---] Local time is UTC +1 hours

16-Apr-2009 19:56:26 [---] No CUDA devices found

16-Apr-2009 19:56:26 [---] No coprocessors

16-Apr-2009 19:56:26 [---] Not using a proxy

16-Apr-2009 19:56:26 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4454414; location: work; project prefs: default

16-Apr-2009 19:56:26 [SETI@home] General prefs: from SETI@home (last modified 02-Oct-2008 09:23:33)

16-Apr-2009 19:56:26 [SETI@home] Computer location: work

16-Apr-2009 19:56:26 [SETI@home] General prefs: no separate prefs for work; using your defaults

16-Apr-2009 19:56:26 [---] Preferences limit memory usage when active to 1006.59MB

16-Apr-2009 19:56:26 [---] Preferences limit memory usage when idle to 1811.87MB

16-Apr-2009 19:56:26 [---] Preferences limit disk usage to 50.89GB

16-Apr-2009 19:56:26 [---] Preferences limit # CPUs to 1

16-Apr-2009 19:56:26 [SETI@home] Restarting task ap_01fe09ac_B5_P1_00186_20090404_06169.wu_2 using astropulse_v5 version 503

16-Apr-2009 19:57:01 [SETI@home] Task ap_01fe09ac_B5_P1_00186_20090404_06169.wu_2 exited with zero status but no 'finished' file

16-Apr-2009 19:57:01 [SETI@home] If this happens repeatedly you may need to reset the project.





Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread 0x2b68ea1f71b0 (LWP 10054)]

0x00002b68e9ef147a in ?? () from /lib64/libc.so.6

(gdb)

(gdb) bt

#0  0x00002b68e9ef147a in ?? () from /lib64/libc.so.6

#1  0x00002b68e9ef7b6a in ?? () from /lib64/libc.so.6

#2  0x00002b68e9ef8a6a in __printf_fp () from /lib64/libc.so.6

#3  0x00002b68e9ef3588 in vfprintf () from /lib64/libc.so.6

#4  0x00002b68e9f12a39 in vsprintf () from /lib64/libc.so.6

#5  0x00002b68e9efc638 in sprintf () from /lib64/libc.so.6

#6  0x000000000040df19 in ACTIVE_TASK_SET::send_heartbeats (this=0x680da0)

    at app_control.cpp:420

#7  0x000000000040b2bd in ACTIVE_TASK_SET::poll (this=0x680da0) at app.cpp:383

#8  0x0000000000416578 in CLIENT_STATE::poll_slow_events (this=0x680ce0)

    at client_state.cpp:630

#9  0x000000000043e0d6 in boinc_main_loop () at main.cpp:557

#10 0x00002b68e9eceb54 in __libc_start_main () from /lib64/libc.so.6

#11 0x00000000004058b9 in _start ()

(gdb)


comment:9 Changed 15 years ago by davea

Hmm. When it crashes, can you please type

frame 6 p atp->procinfo.working_set_size p ar

and post the results? thanks.

comment:10 Changed 15 years ago by Nicolas

That line got reformatted... David wrote:

please type

frame 6
p atp->procinfo.working_set_size
p ar

comment:11 Changed 15 years ago by tomchiverton

Hmm, the crash appears to be later:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2ba55a00b9c0 (LWP 30567)]
0x00002ba559d0547a in ?? () from /lib64/libc.so.6
(gdb) bt
#0  0x00002ba559d0547a in ?? () from /lib64/libc.so.6
#1  0x00002ba559d05112 in ?? () from /lib64/libc.so.6
#2  0x00002ba559cfd63d in ?? () from /lib64/libc.so.6
#3  0x000000000040e288 in ACTIVE_TASK::get_app_status_msg (this=0x6ecfd0)
    at /usr/include/stdlib.h:330
#4  0x000000000040e5e8 in ACTIVE_TASK_SET::get_msgs (this=0x680da0)
    at app_control.cpp:1021
#5  0x000000000040b2f3 in ACTIVE_TASK_SET::poll (this=0x680da0) at app.cpp:389
#6  0x00000000004165a8 in CLIENT_STATE::poll_slow_events (this=0x680ce0)
    at client_state.cpp:630
#7  0x000000000043e106 in boinc_main_loop () at main.cpp:557
#8  0x00002ba559ce2b54 in __libc_start_main () from /lib64/libc.so.6
#9  0x00000000004058b9 in _start ()

17-Apr-2009 09:57:56 [SETI@home] Restarting task ap_01fe09ac_B5_P1_00186_20090404_06169.wu_2 using astropulse_v5 version 503

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2ba55a00b9c0 (LWP 30567)]
0x00002ba559d0547a in ?? () from /lib64/libc.so.6
(gdb) bt
#0  0x00002ba559d0547a in ?? () from /lib64/libc.so.6
#1  0x00002ba559d05112 in ?? () from /lib64/libc.so.6
#2  0x00002ba559cfd63d in ?? () from /lib64/libc.so.6
#3  0x000000000040e288 in ACTIVE_TASK::get_app_status_msg (this=0x6ecfd0)
    at /usr/include/stdlib.h:330
#4  0x000000000040e5e8 in ACTIVE_TASK_SET::get_msgs (this=0x680da0)
    at app_control.cpp:1021
#5  0x000000000040b2f3 in ACTIVE_TASK_SET::poll (this=0x680da0) at app.cpp:389
#6  0x00000000004165a8 in CLIENT_STATE::poll_slow_events (this=0x680ce0)
    at client_state.cpp:630
#7  0x000000000043e106 in boinc_main_loop () at main.cpp:557
#8  0x00002ba559ce2b54 in __libc_start_main () from /lib64/libc.so.6
#9  0x00000000004058b9 in _start ()
(gdb) frame 4
#4  0x000000000040e5e8 in ACTIVE_TASK_SET::get_msgs (this=0x680da0)
    at app_control.cpp:1021
1021            if (atp->get_app_status_msg()) {
(gdb) p atp->get_app_status_msg()
$1 = true
(gdb) print-object atp->get_app_status_msg()
Cannot access memory at address 0x0
(gdb)    
(gdb)       

comment:12 Changed 15 years ago by tomchiverton

OK, I got a dump from the orig. crash point:

17-Apr-2009 11:04:59 [SETI@home] Restarting task ap_01fe09ac_B5_P1_00186_20090404_06169.wu_2 using astropulse_v5 version 503
^[[C

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2b643b7639c0 (LWP 32687)]
0x00002b643b45d47a in ?? () from /lib64/libc.so.6
(gdb)
(gdb) bt
#0  0x00002b643b45d47a in ?? () from /lib64/libc.so.6
#1  0x00002b643b463b6a in ?? () from /lib64/libc.so.6
#2  0x00002b643b464a6a in __printf_fp () from /lib64/libc.so.6
#3  0x00002b643b45f588 in vfprintf () from /lib64/libc.so.6
#4  0x00002b643b484e7a in vsnprintf () from /lib64/libc.so.6
#5  0x00002b643b4685a3 in snprintf () from /lib64/libc.so.6
#6  0x000000000040df1e in ACTIVE_TASK_SET::send_heartbeats (this=0x680da0)
    at app_control.cpp:420
#7  0x000000000040b2bd in ACTIVE_TASK_SET::poll (this=0x680da0) at app.cpp:383
#8  0x00000000004165a8 in CLIENT_STATE::poll_slow_events (this=0x680ce0)
    at client_state.cpp:630
#9  0x000000000043e106 in boinc_main_loop () at main.cpp:557
#10 0x00002b643b43ab54 in __libc_start_main () from /lib64/libc.so.6
#11 0x00000000004058b9 in _start ()
(gdb) frame 6
#6  0x000000000040df1e in ACTIVE_TASK_SET::send_heartbeats (this=0x680da0)
    at app_control.cpp:420
420                     );
(gdb) p atp->procinfo.working_set_size
$1 = 48103424
(gdb) p ar
$2 = 1055490048
(gdb)           

comment:13 Changed 15 years ago by tomchiverton

I can reduce the incident of the bug by setting BOINC not to keep applications in memory when suspended.

Note: See TracTickets for help on using tickets.