Re: kfaultd report

Andrew E. Mileski (aem@nic.ott.hookup.net)
Tue, 4 Feb 1997 17:20:15 -0500 (EST)


> I never had a problem with it. Unfortunately, it didn't do me any good,
> simply because I can't seem to trigger stack corruption except in 2.0.2x
> with the pentium memcpy patch installed.

The aic7xxx_isr code in v2.1.20 tries to allocate about 1184 bytes on
the stack, then push a doubleword, which results in a fault quite often.
At least this is what it does on _my_ system (looked at the assembly dump).

Dan Eischen(sp?) has suggested I try the experimental driver, which
from his observation only allocates about 32 bytes on the stack.
As soon as I get the chance, I'll try it and report back.

> Even then, only when I fiddle
> with vfat filesystems. (known race) There is no way that I can see that
> kfaultd can possibly trash a system... all it does is trigger an oops.
> Maybe you were saying that the stack corruption ate your system.. it
> may of course if you try to continue running after corruption exists.

Without the kfaultd patch, the system continues on without problems.
Only the faulting process exits. fsck'ing after such an event turns
up nothing.

The kfaultd panic locks my system 98% of the time, and leaves my drives
in a particularly messed up state. Okay, kfaultd isn't the one causing
the problem, but that still doesn't help my drives any.

> What were you doing at the time of stack overflow/corruption?

Moving files from an ext2 (SCSI) fs to a VFAT (EIDE) fs last evening greatly
increased the odds of faulting to something like "almost a sure thing".

It got so bad, I had to resort to:
for t in * ; do mv $t dest/ ; sync ; sync ; sync ; sync ; sync ; done
This helped considerably (the sync delay must be the key), but it still
wasn't bullet-proof.

After a dozen cycles of panic, reboot, fsck, reboot, I lost about 200MB
of data, and I suspect some of the remaining data (about 4GB) has been
randomly corrupted too (this is from past experience). I'm starting to
record the RIPE-MD160 hashes of all files to help detect this.

Lastly, I'll add that Windows'95 will work with a corrupted VFAT fs,
and even "scandisk" will report all is okay. Running "defrag" on a
non-swap drive turns up errors, which "scandisk" can then find and
correct. Makes no sense to me, but hey, it is my own fault for using
Windows95 (I have no choice, as a proprietary program I use for work
requires it).

--
Andrew E. Mileski   mailto:aem@ott.hookup.net
Linux Plug-and-Play Kernel Project http://www.redhat.com/linux-info/pnp/
XFree86 Matrox Team http://www.bf.rmit.edu.au/~ajv/xf86-matrox.html