Re: ksyms for kgdb

Mark H. Wood (mwood@mhw.OIT.IUPUI.EDU)
Fri, 13 Oct 1995 13:38:54 +0000


On Tue, 10 Oct 1995, root wrote:

> > The feature that was definitely the most helpful was crash dumps. Often
> > times, simply knowing the stack trace isn't enough. Being able to run a
> > debugger and take a look at several structures, pointers, etc. is just
> > very, very, very convenient.
>
> I've been screaming for this for a while; but every time I've brought it up,
> it's been shot down... reasons given:
>
> 1) It's not really useful, you can't ever track down crashes with
> crash dumps (totally ludicrous)

Lots of people can do this. We pay big bucks here to several vendors, to
have their people do just that, and they get results.

> 2) By the time the kernel crashes, there's been too much damage done
> to trace anything (*maybe*.. but it's helpful to look for "footprints"
> to see what stomped on what.)

Sometimes yes, sometimes no. Usually no in my experience.

> 3) It's never likely to work. If the kernel crashes and you try to dump
> to disk, you will most certainly format your hard drive etc. etc.
> (ok fine. if that's the way you want to play, we'll dump to
> FLOPPY DISK. Take that!)

I've never seen this happen in 17 years of watching machines dump.

> > I'm thinking that it might be useful to define an interface to certain
> > block devices which allow you to do I/O that bypasses the buffer cache,
> > for the express purposes of allowing crash dumps to be written to some
> > contiguous area of disk (probably a swap partition). The bounds of the
> > crash dump area could be calculated in advance, and stored somewhere
> > safe; this would reduce the amount of code that the crash dump routines
> > would have to depend upon. (And, you really don't want to touch the
> > buffer cache at all while you're doing the crash dump, lest that disturb
> > valuable evidence about What Went Wrong.)

This is approximately how it's done in OpenVMS: the kernel opens the
dump file at boot time and keeps it open for exclusive write (so you
can't muck with it in any way), and reads in all the retrieval pointers
so it knows where all of the pieces are. When a dump is wanted, the same
really simple, really bulletproof code that read the kernel in (the
secondary bootstrap) is used to write the dump image out. Everything
that the dump code cares about can be marked kernel read-only once the
pointers are known.

> Now try to convince the other developers that crash dumps are actually a
> *GOOD THING*. I've tried, and others have too.
>
> IMHO, there's three good candidates for crash dumps:
>
> 1) Floppy disk. Should be *ABSOLUTELY NO EXCUSES*. There is no danger of
> blowing up your hard disk with this method. The only possible argument
> against it is the size of the device. So... use multiple floppies :)

Ugh! Netware does it this way. I've never had the patience to copy out
a dump (and Novell's never shown any interest in seeing one).

> 2) Swap partition. Do a crash dump with a special signature at the
> header. When Linux boots next time, 'swapon' or something could look for
> the signature and if present, save the crash dump to a real disk file
> before enabling swap.

OpenVMS will do that too if there is no dumpfile. This practice is
frowned upon.

> 3) Crash partition. Do a crash dump to a special partition. For the
> ultra paranoid, this can be on a separate drive.

That *would* make the code nice and simple....

Mark H. Wood, Lead System Programmer MWOOD@INDYVAX.IUPUI.EDU
Those who will not learn from history are doomed to reimplement it.