Re: mkfs.ext2 triggerd RAM corruption

From: Theodore Tso
Date: Sat May 05 2007 - 14:58:09 EST


On Sat, May 05, 2007 at 03:36:37AM +0200, Bernd Schubert wrote:
> distribution: modified debian sarge, in which aspect is the distribution
> important for this problem? mkfs2.ext2 is supposed to write to /dev/sdaX
> and not /dev/rd/0. Stracing it and grepping for open calls shows that
> only /dev/sdaX is opened in read-write mode.

/dev/rd/0? What's this? Is this the partition where your root
partition is found? What is it? Is it a ramdisk? Or is it some kind
of persistent storage device?

If it is a persistant storage device, do the corrupted files stay
corrupted when you reboot? (If it's a ramdisk which you load, then
obviously it's getting reloaded on reboot.) You didn't give enough
information to be sure exactly what's going on.

The next thing to ask is how the files are corrupted. Can you see
save a copy of the corrupted files to stable storage, so you can see
*how* they were corrupted. Were large swaths of zeros getting written
into it?

Next question; if you don't use these mke2fs parameters, can you
reproduce the corruption?

mkfs.ext2 -j -b 4096 -F -i 4096 -J size=400 -I 512 /dev/sda4

What if you change the it to:

mkfs.ext2 -j -b 4096 /dev/sda4

Do you still see corruption problems?

> I already tested several partition types, e.g. something like this for a
> test on sda3
>
> beo-05:~# sfdisk -d /dev/sda
> # partition table of /dev/sda
> unit: sectors
>
> /dev/sda1 : start= 63, size= 4208967, Id=83
> /dev/sda2 : start= 4209030, size= 4209030, Id=83
> /dev/sda3 : start= 8418060, size=313251435, Id=83
> /dev/sda4 : start= 0, size= 0, Id= 0

What if the partition size is smaller; does that make the problem go
away? If so, can you do a binary search on the partition size where
the problem appears?

And what can you say about the SATA driver you were using; were all of
the machines that you tested this on using the same SATA controller
and same driver?

Obviously if this were a generic kernel problem, we'd been hearing
about this from a lot more people. So there has to be something
unique to your setup, and we need to figure out what that might happen
to be.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/