Opened 12 years ago

Last modified 12 years ago

#1203 new Defect

md5_file: Too many open files

Reported by: smoe Owned by: davea
Priority: Undetermined Milestone: Undetermined
Component: Client - Daemon Version: 7.0.27
Keywords: Cc:

Description

I found the boinc-client to have stopped for no apparent reason. It was working only with a local self-built SETI client. I had seen this once a long time before, though, back then with the WCG.

From stderrdae.txt:

No protocol specified
No protocol specified
No protocol specified
No protocol specified
...

dir_open: Could not open directory 'slots/0'.
dir_open: Could not open directory 'slots/18'.
dir_open: Could not open directory 'slots/17'.
dir_open: Could not open directory 'slots/12'.
dir_open: Could not open directory 'slots/7'.
dir_open: Could not open directory 'slots/4'.
dir_open: Could not open directory 'slots/22'.
dir_open: Could not open directory 'slots/19'.
dir_open: Could not open directory 'slots/9'.
dir_open: Could not open directory 'slots/16'.
dir_open: Could not open directory 'slots/14'.
dir_open: Could not open directory 'slots/20'.
dir_open: Could not open directory 'slots/8'.
dir_open: Could not open directory 'slots/3'.
dir_open: Could not open directory 'slots/23'.
dir_open: Could not open directory 'slots/11'.
...

dir_open: Could not open directory 'slots/7'.
dir_open: Could not open directory 'slots/7'.
dir_open: Could not open directory 'slots/7'.
md5_file: can't open projects/einstein.phys.uwm.edu/einstein_S6LV1_1.10_i686-pc-linux-gnu__SSE2
md5_file: Too many open files
dir_open: Could not open directory 'projects/setiathome.berkeley.edu'.
dir_open: Could not open directory 'slots/24'.
md5_file: can't open projects/setiathome.berkeley.edu/14ja12ac.18155.67.4.10.61_1_0
md5_file: Too many open files
dir_open: Could not open directory 'slots/14'.
dir_open: Could not open directory 'slots/14'.
md5_file: can't open projects/einstein.phys.uwm.edu/hsgamma_FGRP1_0.23_i686-pc-linux-gnu
md5_file: Too many open files
md5_file: can't open projects/boinc.bakerlab.org_rosetta/minirosetta_3.26_x86_64-pc-linux-gnu
md5_file: Too many open files
dir_open: Could not open directory 'projects/docking.cis.udel.edu'.
dir_open: Could not open directory 'projects/spin.fh-bielefeld.de'.
dir_open: Could not open directory 'projects/boinc.fzk.de_poem'.
dir_open: Could not open directory 'projects/qah.uni-muenster.de'.
dir_open: Could not open directory 'projects/www.rechenkraft.net_yoyo'.
dir_open: Could not open directory 'projects/www.worldcommunitygrid.org'.
dir_open: Could not open directory 'slots/21'.
dir_open: Could not open directory 'slots/21'.
dir_open: Could not open directory 'slots/21'.
md5_file: can't open projects/www.worldcommunitygrid.org/wcg_faah_autodock_6.40_i686-pc-linux-gnu
md5_file: Too many open files
dir_open: Could not open directory 'projects/lhcathomeclassic.cern.ch_sixtrack'.
dir_open: Could not open directory 'slots/0'.
dir_open: Could not open directory 'slots/1'.
dir_open: Could not open directory 'slots/2'.
....

dir_open: Could not open directory 'slots/21'.
dir_open: Could not open directory 'slots/22'.
dir_open: Could not open directory 'slots/23'.
dir_open: Could not open directory 'slots/4'.
md5_file: can't open projects/setiathome.berkeley.edu/30dc09aj.1678.25025.13.10.226_2_0
md5_file: Too many open files

From stdoutdae.txt:

21-Aug-2012 10:08:37 [SETI@home] Temporarily failed download of 23jn11ad.13583.17249.14.10.205: transient HTTP error
21-Aug-2012 10:08:37 [SETI@home] Backing off 4 min 36 sec on download of 23jn11ad.13583.17249.14.10.205
21-Aug-2012 10:08:37 [SETI@home] Temporarily failed download of 31oc10ac.1632.15183.4.10.58: transient HTTP error
21-Aug-2012 10:08:37 [SETI@home] Backing off 5 min 27 sec on download of 31oc10ac.1632.15183.4.10.58
21-Aug-2012 10:09:01 [---] Project communication failed: attempting access to reference site
21-Aug-2012 10:09:02 [---] Internet access OK - project servers may be temporarily down.
21-Aug-2012 10:13:40 [SETI@home] Started download of 05my12ad.31349.14382.3.10.249
21-Aug-2012 10:13:40 [SETI@home] Started download of 05my12ad.31349.14382.3.10.255
21-Aug-2012 10:13:53 [SETI@home] Finished download of 05my12ad.31349.14382.3.10.249
21-Aug-2012 10:13:53 [SETI@home] Started download of 23jn11ad.13583.17249.14.10.241
21-Aug-2012 10:13:54 [SETI@home] Finished download of 05my12ad.31349.14382.3.10.255
21-Aug-2012 10:13:54 [SETI@home] Started download of 23jn11ad.13583.17249.14.10.205
21-Aug-2012 10:14:02 [SETI@home] Finished download of 23jn11ad.13583.17249.14.10.241
21-Aug-2012 10:14:02 [SETI@home] Started download of 05my12ad.31349.14382.3.10.224
21-Aug-2012 10:14:12 [SETI@home] Finished download of 23jn11ad.13583.17249.14.10.205
21-Aug-2012 10:14:12 [SETI@home] Finished download of 05my12ad.31349.14382.3.10.224
21-Aug-2012 10:14:12 [SETI@home] Started download of 31oc10ac.1632.15183.4.10.58
21-Aug-2012 10:14:12 [SETI@home] Started download of 30dc09aj.1678.25025.13.10.226
21-Aug-2012 10:14:29 [SETI@home] Finished download of 30dc09aj.1678.25025.13.10.226
21-Aug-2012 10:14:29 [SETI@home] Started download of 31oc10ac.1632.15183.4.10.64
21-Aug-2012 10:14:30 [SETI@home] Finished download of 31oc10ac.1632.15183.4.10.58
21-Aug-2012 10:14:34 [SETI@home] Finished download of 31oc10ac.1632.15183.4.10.64
21-Aug-2012 10:17:46 [SETI@home] Started download of 30jn10ab.1159.23777.7.10.6.vlar
21-Aug-2012 10:17:55 [SETI@home] Starting task 23jn11ad.13583.17249.14.10.241_1 using setiathome_enhanced version 612 in slot 0
21-Aug-2012 10:17:55 [SETI@home] Starting task 05my12ad.31349.14382.3.10.247_0 using setiathome_enhanced version 612 in slot 1
21-Aug-2012 10:17:55 [SETI@home] Starting task 23jn11ad.13583.17249.14.10.229_1 using setiathome_enhanced version 612 in slot 2
21-Aug-2012 10:17:55 [SETI@home] Starting task 05my12ad.31349.14382.3.10.224_1 using setiathome_enhanced version 612 in slot 3
21-Aug-2012 10:17:55 [SETI@home] Starting task 30dc09aj.1678.25025.13.10.226_2 using setiathome_enhanced version 612 in slot 4
21-Aug-2012 10:17:55 [SETI@home] Starting task 23jn11ad.13583.17249.14.10.228_0 using setiathome_enhanced version 612 in slot 5
21-Aug-2012 10:17:55 [SETI@home] Starting task 23jn11ad.13583.17249.14.10.248_0 using setiathome_enhanced version 612 in slot 6
21-Aug-2012 10:17:55 [SETI@home] Starting task 30dc09aj.1678.25025.13.10.220_2 using setiathome_enhanced version 612 in slot 7
21-Aug-2012 10:17:55 [SETI@home] Starting task 05my12ad.31349.14382.3.10.255_0 using setiathome_enhanced version 612 in slot 8
21-Aug-2012 10:17:55 [SETI@home] Starting task 05my12ad.31349.14382.3.10.249_0 using setiathome_enhanced version 612 in slot 9
21-Aug-2012 10:17:55 [SETI@home] Starting task 23jn11ad.13583.17249.14.10.205_1 using setiathome_enhanced version 612 in slot 10
21-Aug-2012 10:17:55 [SETI@home] Starting task 05my12ad.31349.14382.3.10.246_0 using setiathome_enhanced version 612 in slot 11
21-Aug-2012 10:17:55 [SETI@home] Starting task 27my10ac.18052.55637.5.10.1_2 using setiathome_enhanced version 612 in slot 12
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.53_0 using setiathome_enhanced version 612 in slot 13
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.41_1 using setiathome_enhanced version 612 in slot 14
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.58_0 using setiathome_enhanced version 612 in slot 15
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.49_0 using setiathome_enhanced version 612 in slot 16
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.52_0 using setiathome_enhanced version 612 in slot 17
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.64_0 using setiathome_enhanced version 612 in slot 18
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.36_1 using setiathome_enhanced version 612 in slot 19
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.28_1 using setiathome_enhanced version 612 in slot 20
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.47_0 using setiathome_enhanced version 612 in slot 21
21-Aug-2012 10:17:59 [SETI@home] Finished download of 30jn10ab.1159.23777.7.10.6.vlar
21-Aug-2012 10:17:59 [SETI@home] Starting task 30jn10ab.1159.23777.7.10.6.vlar_3 using setiathome_enhanced version 612 in slot 22
21-Aug-2012 10:18:26 [SETI@home] Started download of 19se10ac.457.271346.15.10.37.vlar
21-Aug-2012 10:18:35 [SETI@home] Finished download of 19se10ac.457.271346.15.10.37.vlar
21-Aug-2012 10:18:35 [SETI@home] Starting task 19se10ac.457.271346.15.10.37.vlar_3 using setiathome_enhanced version 612 in slot 23
21-Aug-2012 10:48:33 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 10:48:33 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 10:48:33 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 10:48:33 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 10:48:33 [SETI@home] Can't get task disk usage: opendir() failed
....

1-Aug-2012 11:38:37 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 11:38:37 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 11:38:37 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 11:38:37 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 11:45:55 [SETI@home] read_stderr_file(): malloc() failed
21-Aug-2012 11:45:55 [SETI@home] Computation for task 30dc09aj.1678.25025.13.10.226_2 finished
21-Aug-2012 11:45:55 [---] Can't open client_state_next.xml: fopen() failed
21-Aug-2012 11:45:55 [---] Couldn't write state file: fopen() failed; giving up

Attachments (1)

fix_retval_of_read_stderr_file.diff (900 bytes) - added by Nicolas 12 years ago.
Make read_stderr_file pass the error code of read_file_malloc instead of always returning ERR_MALLOC.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 12 years ago by Nicolas

It looks like something is leaking file descriptors. It's hard to know what's the real cause of this bug without more information. Have you seen this happen more than once?

comment:2 Changed 12 years ago by Nicolas

Component: UndeterminedClient - Daemon
Owner: set to davea

comment:3 Changed 12 years ago by davea

From the " read_stderr_file(): malloc() failed" it looks like you system is out of swap space, or some other memory-related problem. What is the memory usage of the client?

comment:4 Changed 12 years ago by Nicolas

Actually, read_stderr_file() returns ERR_MALLOC if read_file_malloc() fails for any reason, including if it was unable to open the file. So I don’t think there’s any memory problem in this case.

Changed 12 years ago by Nicolas

Make read_stderr_file pass the error code of read_file_malloc instead of always returning ERR_MALLOC.

comment:5 Changed 12 years ago by smoe

The mystery is that the issue is not reported by lsof

sudo lsof|cut -f1 -d\ |uniq -c | sort -n

where the only bad tool is indeed iceweasel for all the images etc. While googling about it, I got across

http://stackoverflow.com/questions/10218266/debugging-file-descriptor-leak-in-kernel

which pointed to a never released shared memory. Is this what is happening? Some wild polling/pushing on shared memory in a threaded environment that somewhat has gone wild?

Please kindly review respective communication code for any such evidence.


Steffen

comment:6 in reply to:  4 Changed 12 years ago by smoe


Seems fine. Just, when sprintf-ing to path, please consider making it an snprintf(path,sizeof(path),...) 

Cheers,

Steffen

Replying to Nicolas:

Actually, read_stderr_file() returns ERR_MALLOC if read_file_malloc() fails for any reason, including if it was unable to open the file. So I don’t think there’s any memory problem in this cas

comment:7 Changed 12 years ago by davea

The problem with snprintf (and strncpy) is that if the buffer is exceeded, it's not null-terminated.

Note: See TracTickets for help on using tickets.