Robert Redelmeier writes:
> Daniel Phillips wrote in part:
>> One thing to keep in mind in all of this is: nobody is testing the
>> reliability of their journalling or any other kind of filesystem just by
>> running it. To test these things you have to crash/interrupt the system
>> *a lot*. Otherwise, you are just fooling yourself and everybody else.
>> How many crashes does it take to find that one little window of
>> vulnerability that comes up every 10,000 crashes normally but suddenly
>> starts coming up every time just because your customer uses their system
>> a different way? You're doing the right thing by crash-testing it, now
>> instead of doing it 5 times do it 1,000 times. Here's one of my
>> favorite tests: unzip a kernel source tree and wait until the disk light
>> goes out. A second or so after it comes on again (kflushd) hit the
>> reset button.
>
> Good idea. I certainly believe in stressing hardware (see .sig),
> but I'm not sure this test is rigorous enough. The problem is
> the reset button is only connected to the CPU and the hard disk
> will probably continue to write out sectors from it's hw buffer.
> OTOH, I don't like the idea of pulling the plug too often. It's
> very hard on the hardware. I'd expect a mechanical disk failure
> before 10,000 cycles.
The nice way to develop this code is with a block device that
discards all writes after a timer goes off.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Oct 07 2000 - 21:00:09 EST