Re: RAID5 bug in 2.1.103

Bill Hawes (whawes@star.net)
Tue, 02 Jun 1998 15:26:34 -0400


Richard Jones wrote:

> Funny, I was trying to reproduce the bug only an hour ago.
>
> What I did was have 4 processes doing lots of random `ftruncate(2)'
> calls on 4 files on the RAID-5 disk. I left it running for about
> 2 hours, and ... no bug report.
>
> In fact, I've never been able to reliably reproduce the bug,
> although it seems to occur under the following general conditions:
>
> . high CPU load, and
> . high NFS server load (we are running unfsd 2.2beta33
> and serving to a population of around 20 clients), and
> . perhaps also running `htmerge' (part of the htdig
> web indexing program)
>
> I don't think any of these factors is necessary, but often (not
> always) all three are sufficient.

Hi Richard,

The particular races I suspect are leading to problems could be
exacerbated by having other operations locking the superblock for the fs
in question.
They're probably more exposed for files big enough to have double or
triple indirect blocks as well. So maybe some simulataneous truncating,
reading, and fsyncing would help trigger the problem.

> I can run a diagnostic patch if you like, so long as the patch
> doesn't interfere with stability too much. The current kernel
> version is 2.1.96, but I'm thinking of upgrading to 2.1.10[34]
> perhaps at the weekend.

I'll do a patch first with just printks and no changes. I've actually
already rewritten the truncate code, but I'm afraid to try it out :-) (I
don't believe in backups, but do know when to be careful.)

Regards,
Bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu