Re: The stability crisis

Keith Owens (kaos@ocs.com.au)
Sun, 04 Jul 1999 13:13:27 +1000


On Sat, 3 Jul 1999 07:20:15 -0500 (EST),
"Mark H. Wood" <mwood@IUPUI.Edu> wrote:
>On 2 Jul 1999, Henning P. Schmiedehausen wrote:
>> Actually I think, that sending the oops out over the network (as a
>> compile option, of course) is a nice idea. Maybe I will toy with this
>> sometime this weekend (don't hold your breath, though :-)
>
>There's actually quite a lot of experience with this sort of thing
>already. DECnet nodes have been upline-dumping over DNA Maintenance
>Operation Protocol for decades.

Nobody disagrees with the theory of dumping Oops logs over the net,
parallel printers, floppy disks or even RFC 1149/2549 but you are
focusing on the wrong area. The problem with the Linux kernel,
especially on single user hardware like ix86[*] is this :-

When the kernel is hung, almost all I/O hangs with it.

You can discuss where to dump Oops to your heart's content. Unless you
can rely on the dump mechanism to work when the kernel is dead then you
are wasting your time.

The only reason the serial console works is because it runs in polling
mode and does busy wait in the kernel - no interrupts. The dump to
floppy patch works because it switches into BIOS mode and totally
bypasses the kernel. First work out how to dump to anything without
kernel support. Then design the protocol.

[*] I know we can run multi-user on ix86 but you have to admit it was
never designed with that in mind. Everything in ring 0 - single
point of failure :(. Little or no BIOS support for 32 bit I/O.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/