Kernel debugger, other debugging tools

From: Raphael Manfredi (Raphael_Manfredi@pobox.com)
Date: Thu Mar 09 2000 - 13:39:24 EST


I'm currently facing several problems on my SMP box at home, and I feel
I don't have the needed tools (or knowledge) to debug those problems,
although I have the skills -- I think.

The problem I'm facing are sudden freezes (have to hard-reboot, I have
no SysReq access under X, no keyboard interrupt taken probably), which
naturally occur randomly: the machine can be up for several weeks, and
then I got two freezes in less than 24 hours.

The other problem I have deals with a nasty interaction between my DC390
card (drived by tmscsim) and my Agfa SnapScan 1236S. Whenever I find-scanner,
I can no longer use the CD-ROM on my DC390, or the drivers has an assertion
failure and the machine panics.

[I was running 2.2.13 when I got freezes, and I upgraded to 2.2.14. No
freeze since then, but that does not prove anything!]

What I feel is missing is a way to have SysReq keys request a kernel
"core dump" to swap space, if large enough. Then, at boot time, have
a "savecore" program (user-space) detect whether a core lies in the
swap area(s), and move it out to /var/core/ (say) before the swap
is initialized.

That would allow post-mortem debugging given the (uncompressed) linux
image and the core. Just "like" any other program that dumps core.
I sweep problems like modules, symbols, VM mappings, I/O mappings etc..
under the carpet for now, but I understand those are real challenges.

The second thing that's missing is a NMI interrupt that would force
a linux "core dump" even when the keyboards does not respond. I guess
that's not possible on PCs, because the "reset" button effectively
resets the whole machine without possibility to trap it? (Is this
correct, my knowledge on PC hardware is limited to being able to
build one from pieces).

When I worked on DEC OSF/1 in 1993, we incorporated a built-in
kernel debugger from CMU (I think) and were able to single step
through the kernel (but not through PAL code, or any routine on
the interrupt path), having utility routines duplicated for the
debugger and stripped down to simple cases, to avoid stepping
through routines used by the debugger itself.

One could enter the debugger by hitting "^D" on the console, or it
was entered automatically on panic conditions. This was really
helpful to debug "live" problems on the machine. It was more limited
and painful than running DEC OSF/1 on a software functional simulator
that emulated the Alpha perfectly and let us do everything on the
virtual machine, even single-step through PAL code, since we were
outside the system.

One last thing we had at HP was a special Ethernet device that would
let up remotely debug the kernel and pilot the machine. HP-UX 10.x
also had the amazing "q4" program that would let people monitor
a running kernel and inspect data structures (TCP/IP, STREAMS),
the whole thing having an embedded Perl, used to interpret the
dumping routines for those data structures. We used that to monitor
the ATM stacks in response to signalling events on LAN emulation
client code.

Which tools can I have on Linux to debug my kernel?

I've been using Linux since 1996, and never had the need to look
under the kernel hood, until now... And I find myself powerless because
I don't know which tools I can use to debug.

Raphael

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:16 EST