wiki:ServerDebug

Version 6 (modified by wicked, 12 years ago) (diff)

Updated scheduler debug instructions to match code as it's today.

Debugging server components

A grab-bag of techniques for debugging BOINC server software:

Log files

Each server component (scheduler, feeder, transitioner, etc.) has its own log file. Most error conditions are reported in the log files; make sure you know where they are. If you're interested in the history of a particular WU or result, grep for WU#12345 or RESULT#12345 (where 12345 represents the ID) in the log files. The html/ops pages also provide an interface for this.

To control the verbosity of the log files:

Debugging the scheduler

Start by setting the appropriate logging options and examining the log file.

If this doesn't reveal the problem, or if the scheduler is crashing, you can run it under a debugger like gdb. The scheduler is a CGI program; it reads a request from stdin and writes a reply to stdout. So you can debug it as follows:

  • Copy the "scheduler_request_X.xml" file from a client to the machine running the scheduler. (X = your project URL)
  • Run the scheduler under the debugger, giving it this file as stdin, i.e.:
    gdb cgi
    (set a breakpoint)
    r < scheduler_request_X.xml
    
  • You may have to doctor the database as follows to keep the scheduler from rejecting the request:
    update host set rpc_seqno=0, rpc_time=0 where hostid=N
    

This is useful for figuring out why your project is generating 'no work available' messages. As an alternative to this, edit sched/handle_request.cpp, and put a call to debug_sched("debug_sched"); just before sreply.write(fout, sreq);. Then, after recompiling, touch a file called 'debug_sched' in the project root directory. This will cause transcripts of all subsequent scheduler requests and replies to be written to the cgi-bin/ directory with separate small files for each request. The file names are sched_request_H_R and sched_reply_H_R where H=hostid and R=rpc sequence number. This can be turned off by deleting the 'debug_sched' file.

To get core files for scheduler crashes, uncomment the following line in sched/sched_main.cpp, and recompile:

#define DUMP_CORE_ON_SEGV 1

MySQL interfaces

You should become familiar with MySQL tools such as

Database query tracing

If you uncomment the symbol SHOW_QUERIES in db/db_base.C, and recompile everything, all database queries will be written to stderr (for daemons, this goes to log files; for command-line apps it's written to your terminal). This is verbose but extremely useful for tracking down database-level problems.