Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector

From: Neil Brown (neilb@cse.unsw.edu.au)
Date: Mon Mar 17 2003 - 20:01:40 EST


On Sunday March 16, gilbertd@treblig.org wrote:
> Hi,
> I've just built an 800GB RAID5 array and built an ext3 file system
> on it; on trying to copy data off the 200GB RAID it is replacing I'm
> starting to see errors of the form:
>
> kernel: EXT3-fs error (device md(9,2)): ext3_new_block: Allocating block in
> system zone - block = 140509185
>
> and
>
> kernel: EXT3-fs error (device md(9,2)): ext3_add_entry: bad entry in
> directory #70254593: rec_len %% 4 != 0 - offset=28, inode=23880564,
> rec_len=21587, name_len=76
>
> and
>
> kernel: raid5: multiple 1 requests for sector 281018464

I had exactly these symptoms about a year ago in 2.4.18. I found and
fixed the problem and have just checked and the fix is definately in
2.4.20.
So if you really are running 2.4.20 then it looks like a similar bug
has appeared.

These two symptoms strongly suggest a buffer aliasing problem.
i.e. you have two buffers (one for data and one for metadata)
that refer to the same location on disc.
One is part of a file that was recently deleted, but the buffer hasn't
been flushed yet. The other is part of a new directory.
The old buffer and the new buffer both get written to disc at much the
same time (hence the "multiple 1 requests"), but the old buffer hits
the disc second and so corrupts the filesystem.

The bug I found was specific to data=journal mode, and this certainly
has more options for buffer aliasing. Were you using data=journal?

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Mar 23 2003 - 22:00:22 EST