Re: 2.4. continues after Aieee...

From: Russell King (rmk@arm.linux.org.uk)
Date: Thu Nov 16 2000 - 06:20:30 EST


Rogier Wolff wrote:
> Dennis wrote:
> > network card driver) and leave the system running make linux unusable in
> > unattended environments as the machine is functionally dead.
>
> Which doesn't help in this case, as your network card COULD be dead,
> while the system simply hasn't crashed....

Not every case causes a panic either. This week, I had an instance of
an i686 box lock solid with a DFE-530TX net card. Rebooting/power
cycling it didn't recover it (despite it working for the past month
without any problems). It only started working again after I moved
it into a different PCI slot.

I've seen a couple of instances now on totally different hardware where
it is possible to lock a PCI bus solid by improper connections on some
of the PCI bus lines, so a faulty PCI socket seem to be the most likely
cause.

In this case, a "panic" doesn't help you; the machine experiances a
hardware lockup. To catch these, you'd need a hardware watchdog.

What I'm basically saying is that there is only a limited amount that
Linux (or any OS) can do against these types of hardware failure. If
you need better protection, try a hardware with user-space policy
implementations.
   _____
  |_____| ------------------------------------------------- ---+---+-
  | | Russell King rmk@arm.linux.org.uk --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html / / |
  | +-+-+ --- -+-
  / | THE developer of ARM Linux |+| /|\
 / | | | --- |
    +-+-+ ------------------------------------------------- /\\\ |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Nov 23 2000 - 21:00:10 EST