Opened 13 years ago
Last modified 12 years ago
#1203 new Defect
md5_file: Too many open files
Reported by: | smoe | Owned by: | davea |
---|---|---|---|
Priority: | Undetermined | Milestone: | Undetermined |
Component: | Client - Daemon | Version: | 7.0.27 |
Keywords: | Cc: |
Description
I found the boinc-client to have stopped for no apparent reason. It was working only with a local self-built SETI client. I had seen this once a long time before, though, back then with the WCG.
From stderrdae.txt:
No protocol specified
No protocol specified
No protocol specified
No protocol specified
...
dir_open: Could not open directory 'slots/0'.
dir_open: Could not open directory 'slots/18'.
dir_open: Could not open directory 'slots/17'.
dir_open: Could not open directory 'slots/12'.
dir_open: Could not open directory 'slots/7'.
dir_open: Could not open directory 'slots/4'.
dir_open: Could not open directory 'slots/22'.
dir_open: Could not open directory 'slots/19'.
dir_open: Could not open directory 'slots/9'.
dir_open: Could not open directory 'slots/16'.
dir_open: Could not open directory 'slots/14'.
dir_open: Could not open directory 'slots/20'.
dir_open: Could not open directory 'slots/8'.
dir_open: Could not open directory 'slots/3'.
dir_open: Could not open directory 'slots/23'.
dir_open: Could not open directory 'slots/11'.
...
dir_open: Could not open directory 'slots/7'.
dir_open: Could not open directory 'slots/7'.
dir_open: Could not open directory 'slots/7'.
md5_file: can't open projects/einstein.phys.uwm.edu/einstein_S6LV1_1.10_i686-pc-linux-gnu__SSE2
md5_file: Too many open files
dir_open: Could not open directory 'projects/setiathome.berkeley.edu'.
dir_open: Could not open directory 'slots/24'.
md5_file: can't open projects/setiathome.berkeley.edu/14ja12ac.18155.67.4.10.61_1_0
md5_file: Too many open files
dir_open: Could not open directory 'slots/14'.
dir_open: Could not open directory 'slots/14'.
md5_file: can't open projects/einstein.phys.uwm.edu/hsgamma_FGRP1_0.23_i686-pc-linux-gnu
md5_file: Too many open files
md5_file: can't open projects/boinc.bakerlab.org_rosetta/minirosetta_3.26_x86_64-pc-linux-gnu
md5_file: Too many open files
dir_open: Could not open directory 'projects/docking.cis.udel.edu'.
dir_open: Could not open directory 'projects/spin.fh-bielefeld.de'.
dir_open: Could not open directory 'projects/boinc.fzk.de_poem'.
dir_open: Could not open directory 'projects/qah.uni-muenster.de'.
dir_open: Could not open directory 'projects/www.rechenkraft.net_yoyo'.
dir_open: Could not open directory 'projects/www.worldcommunitygrid.org'.
dir_open: Could not open directory 'slots/21'.
dir_open: Could not open directory 'slots/21'.
dir_open: Could not open directory 'slots/21'.
md5_file: can't open projects/www.worldcommunitygrid.org/wcg_faah_autodock_6.40_i686-pc-linux-gnu
md5_file: Too many open files
dir_open: Could not open directory 'projects/lhcathomeclassic.cern.ch_sixtrack'.
dir_open: Could not open directory 'slots/0'.
dir_open: Could not open directory 'slots/1'.
dir_open: Could not open directory 'slots/2'.
....
dir_open: Could not open directory 'slots/21'.
dir_open: Could not open directory 'slots/22'.
dir_open: Could not open directory 'slots/23'.
dir_open: Could not open directory 'slots/4'.
md5_file: can't open projects/setiathome.berkeley.edu/30dc09aj.1678.25025.13.10.226_2_0
md5_file: Too many open files
From stdoutdae.txt:
21-Aug-2012 10:08:37 [SETI@home] Temporarily failed download of 23jn11ad.13583.17249.14.10.205: transient HTTP error
21-Aug-2012 10:08:37 [SETI@home] Backing off 4 min 36 sec on download of 23jn11ad.13583.17249.14.10.205
21-Aug-2012 10:08:37 [SETI@home] Temporarily failed download of 31oc10ac.1632.15183.4.10.58: transient HTTP error
21-Aug-2012 10:08:37 [SETI@home] Backing off 5 min 27 sec on download of 31oc10ac.1632.15183.4.10.58
21-Aug-2012 10:09:01 [---] Project communication failed: attempting access to reference site
21-Aug-2012 10:09:02 [---] Internet access OK - project servers may be temporarily down.
21-Aug-2012 10:13:40 [SETI@home] Started download of 05my12ad.31349.14382.3.10.249
21-Aug-2012 10:13:40 [SETI@home] Started download of 05my12ad.31349.14382.3.10.255
21-Aug-2012 10:13:53 [SETI@home] Finished download of 05my12ad.31349.14382.3.10.249
21-Aug-2012 10:13:53 [SETI@home] Started download of 23jn11ad.13583.17249.14.10.241
21-Aug-2012 10:13:54 [SETI@home] Finished download of 05my12ad.31349.14382.3.10.255
21-Aug-2012 10:13:54 [SETI@home] Started download of 23jn11ad.13583.17249.14.10.205
21-Aug-2012 10:14:02 [SETI@home] Finished download of 23jn11ad.13583.17249.14.10.241
21-Aug-2012 10:14:02 [SETI@home] Started download of 05my12ad.31349.14382.3.10.224
21-Aug-2012 10:14:12 [SETI@home] Finished download of 23jn11ad.13583.17249.14.10.205
21-Aug-2012 10:14:12 [SETI@home] Finished download of 05my12ad.31349.14382.3.10.224
21-Aug-2012 10:14:12 [SETI@home] Started download of 31oc10ac.1632.15183.4.10.58
21-Aug-2012 10:14:12 [SETI@home] Started download of 30dc09aj.1678.25025.13.10.226
21-Aug-2012 10:14:29 [SETI@home] Finished download of 30dc09aj.1678.25025.13.10.226
21-Aug-2012 10:14:29 [SETI@home] Started download of 31oc10ac.1632.15183.4.10.64
21-Aug-2012 10:14:30 [SETI@home] Finished download of 31oc10ac.1632.15183.4.10.58
21-Aug-2012 10:14:34 [SETI@home] Finished download of 31oc10ac.1632.15183.4.10.64
21-Aug-2012 10:17:46 [SETI@home] Started download of 30jn10ab.1159.23777.7.10.6.vlar
21-Aug-2012 10:17:55 [SETI@home] Starting task 23jn11ad.13583.17249.14.10.241_1 using setiathome_enhanced version 612 in slot 0
21-Aug-2012 10:17:55 [SETI@home] Starting task 05my12ad.31349.14382.3.10.247_0 using setiathome_enhanced version 612 in slot 1
21-Aug-2012 10:17:55 [SETI@home] Starting task 23jn11ad.13583.17249.14.10.229_1 using setiathome_enhanced version 612 in slot 2
21-Aug-2012 10:17:55 [SETI@home] Starting task 05my12ad.31349.14382.3.10.224_1 using setiathome_enhanced version 612 in slot 3
21-Aug-2012 10:17:55 [SETI@home] Starting task 30dc09aj.1678.25025.13.10.226_2 using setiathome_enhanced version 612 in slot 4
21-Aug-2012 10:17:55 [SETI@home] Starting task 23jn11ad.13583.17249.14.10.228_0 using setiathome_enhanced version 612 in slot 5
21-Aug-2012 10:17:55 [SETI@home] Starting task 23jn11ad.13583.17249.14.10.248_0 using setiathome_enhanced version 612 in slot 6
21-Aug-2012 10:17:55 [SETI@home] Starting task 30dc09aj.1678.25025.13.10.220_2 using setiathome_enhanced version 612 in slot 7
21-Aug-2012 10:17:55 [SETI@home] Starting task 05my12ad.31349.14382.3.10.255_0 using setiathome_enhanced version 612 in slot 8
21-Aug-2012 10:17:55 [SETI@home] Starting task 05my12ad.31349.14382.3.10.249_0 using setiathome_enhanced version 612 in slot 9
21-Aug-2012 10:17:55 [SETI@home] Starting task 23jn11ad.13583.17249.14.10.205_1 using setiathome_enhanced version 612 in slot 10
21-Aug-2012 10:17:55 [SETI@home] Starting task 05my12ad.31349.14382.3.10.246_0 using setiathome_enhanced version 612 in slot 11
21-Aug-2012 10:17:55 [SETI@home] Starting task 27my10ac.18052.55637.5.10.1_2 using setiathome_enhanced version 612 in slot 12
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.53_0 using setiathome_enhanced version 612 in slot 13
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.41_1 using setiathome_enhanced version 612 in slot 14
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.58_0 using setiathome_enhanced version 612 in slot 15
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.49_0 using setiathome_enhanced version 612 in slot 16
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.52_0 using setiathome_enhanced version 612 in slot 17
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.64_0 using setiathome_enhanced version 612 in slot 18
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.36_1 using setiathome_enhanced version 612 in slot 19
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.28_1 using setiathome_enhanced version 612 in slot 20
21-Aug-2012 10:17:55 [SETI@home] Starting task 31oc10ac.1632.15183.4.10.47_0 using setiathome_enhanced version 612 in slot 21
21-Aug-2012 10:17:59 [SETI@home] Finished download of 30jn10ab.1159.23777.7.10.6.vlar
21-Aug-2012 10:17:59 [SETI@home] Starting task 30jn10ab.1159.23777.7.10.6.vlar_3 using setiathome_enhanced version 612 in slot 22
21-Aug-2012 10:18:26 [SETI@home] Started download of 19se10ac.457.271346.15.10.37.vlar
21-Aug-2012 10:18:35 [SETI@home] Finished download of 19se10ac.457.271346.15.10.37.vlar
21-Aug-2012 10:18:35 [SETI@home] Starting task 19se10ac.457.271346.15.10.37.vlar_3 using setiathome_enhanced version 612 in slot 23
21-Aug-2012 10:48:33 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 10:48:33 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 10:48:33 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 10:48:33 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 10:48:33 [SETI@home] Can't get task disk usage: opendir() failed
....
1-Aug-2012 11:38:37 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 11:38:37 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 11:38:37 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 11:38:37 [SETI@home] Can't get task disk usage: opendir() failed
21-Aug-2012 11:45:55 [SETI@home] read_stderr_file(): malloc() failed
21-Aug-2012 11:45:55 [SETI@home] Computation for task 30dc09aj.1678.25025.13.10.226_2 finished
21-Aug-2012 11:45:55 [---] Can't open client_state_next.xml: fopen() failed
21-Aug-2012 11:45:55 [---] Couldn't write state file: fopen() failed; giving up
Attachments (1)
Change History (8)
comment:1 Changed 13 years ago by
comment:2 Changed 13 years ago by
Component: | Undetermined → Client - Daemon |
---|---|
Owner: | set to davea |
comment:3 Changed 13 years ago by
From the " read_stderr_file(): malloc() failed" it looks like you system is out of swap space, or some other memory-related problem. What is the memory usage of the client?
comment:4 follow-up: 6 Changed 13 years ago by
Actually, read_stderr_file()
returns ERR_MALLOC
if read_file_malloc()
fails for any reason, including if it was unable to open the file. So I don’t think there’s any memory problem in this case.
Changed 13 years ago by
Attachment: | fix_retval_of_read_stderr_file.diff added |
---|
Make read_stderr_file
pass the error code of read_file_malloc
instead of always returning ERR_MALLOC
.
comment:5 Changed 12 years ago by
The mystery is that the issue is not reported by lsof
sudo lsof|cut -f1 -d\ |uniq -c | sort -n
where the only bad tool is indeed iceweasel for all the images etc. While googling about it, I got across
http://stackoverflow.com/questions/10218266/debugging-file-descriptor-leak-in-kernel
which pointed to a never released shared memory. Is this what is happening? Some wild polling/pushing on shared memory in a threaded environment that somewhat has gone wild?
Please kindly review respective communication code for any such evidence.
Steffen
comment:6 Changed 12 years ago by
Seems fine. Just, when sprintf-ing to path, please consider making it an snprintf(path,sizeof(path),...)
Cheers,
Steffen
Replying to Nicolas:
Actually,
read_stderr_file()
returnsERR_MALLOC
ifread_file_malloc()
fails for any reason, including if it was unable to open the file. So I don’t think there’s any memory problem in this cas
comment:7 Changed 12 years ago by
The problem with snprintf (and strncpy) is that if the buffer is exceeded, it's not null-terminated.
It looks like something is leaking file descriptors. It's hard to know what's the real cause of this bug without more information. Have you seen this happen more than once?