Re: The stability crisis

Matti Aarnio (matti.aarnio@sonera.fi)
Wed, 30 Jun 1999 03:01:26 +0300


On Tue, Jun 29, 1999 at 11:04:10PM +0000, Aaron Lehmann wrote:
> I really wish I could report my oopses, but this is a production box and I
> can't just let it sit there while I write down an oops. The syslog
> doesn't catch the OOPS except for sometimes the first few lines. Uptime is
> very important to me, as you have probably noticed from my rant. I don't
> want to use a serial console becuase I don't have another machine in the
> vicinity of 20 feet that would be capable of easilly logging kernel
> messages.

All the more important: USE SERIAL CONSOLE LOGGING, AND LOG
THAT DATA TO SEPARATE MACHINE!

I have two machines at my office desk, and I do that both ways.
It is *so* much simpler to go to the logger machine to see what
was the last kernel message before the other box croaked...
(My logger program ? Manually started Kermit.)

If you are so poor that you can't afford a small, otherwise
obsolete 386 box as your logger, well, life is hard...
( Have then a 8088 box running MSDOS and Kermit to do the logging,
if you can find such.. or serial interfaced printer! )

> As I said in a previous message to linux-kernel, I'd be happy to maintain
> a bug database if that would be within my realm of comprehension (I don't
> know very much about the kernel internals...).

Collecting the basic reports into a repository along with ensuring
certain quality degree (technical details, like properly decoded
Oopses, what compilers are used, what installation, what kind of
kernel configuration, which source was used, what patches,
hardware setup, motherboard model, CPU model and stepping, cards,
issues like MTRR settings, disk information, ...)

What *other* messages beside Oopses appear in the system ?
(and again same details)

Some problems can be truly hard to find, and only shifting thru
largeish amount of reports could bring up some magic correlation
giving our "wizards" a way to suspect some odd detail, and not
only to brush such off as "bad hardware".

Because there are kwazilion bug-tracking systems around, writing
one from scratch isn't likely necessary, but even choosing, and
customizing one into this use is quite difficult task in itself..

> The machine is a Cyrix 6x86MX (no SMP) running RedHat 5.1 with most of the
> packages at either 5.2 or 6.0 versions. MTRR is enabled in the kernel but
> I haven't used it for anything yet so I would assume that it is not
> causing problems. I don't run X. No quotas.
>
> [aaronl@vitelus aaronl]$ gcc --version
> egcs-2.91.66

?? Ah:

# gcc -v
Reading specs from /usr/lib/gcc-lib/alpha-redhat-linux/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)

that second line is more usefull than plain '--version'.

/Matti Aarnio <matti.aarnio@sonera.fi>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/