Re: High-availability question

Steve Underwood (steveu@netpage.com.hk)
Tue, 27 Jul 1999 01:47:36 +0000


"Stephen C. Tweedie" wrote:

> > The data corruption caused by a not correctly plugged in SCSI cable
> > resulted in both mirror sets being corrupted and the corrupted data
> > also being mirrored over to the other system.
>
> Sure --- garbage in, garbage out. You still aren't proof against an
> application failure or against writing the wrong data. That doesn't
> mean that having a redundant storage subsystem is useless: it just means
> that such redundancy is only part of the problem.
>
> btw, Tandem would have caught that scsi error: the existence of such
> fault tolerant systems shows that it _is_ possible to guard against
> random machine failures. You are still in trouble if the application is
> buggy, of course.

You don't need a Tandem to catch this. You just need to turn on the SCSI
parity. For some reason the majority of systems seem to have it turned off.
On some older systems parity didn't work properly, but I don't think that is
an issue today. Still, most systems I look at have parity turned off.

Stratuses die. Tandems die. They die less often, but the consequences are
worse. In my experience a Stratus will have more than one reboot a year, due
to various troubles. A just reboot takes over an hour, so one reboot per year
brings availability down to 99.99%. Tandems have a pretty slow restart too.
The place were Tandems seem to score well is data integrity - banks don't
loose many transactions through them, even when they do down.

Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/