mkfs.ext2 triggerd RAM corruption

From: Bernd Schubert
Date: Fri May 04 2007 - 11:30:47 EST


Hi,

I'm presently rather puzzled, if this is really a kernel bug, its a big bug.

Summary: The system ramdisk (initrd) gets corrupted while running mkfs.ext2 on
a local sata disk partition.

Reproduced on kernel versions: vanilla 2.6.16 - 2.6.20 (<2.6.16 doesn't run on
any of the systems I can do tests with).
Please note: I could reproduce this on serveral systems, all of them use ECC
memory and the memory of most of them the memory is monitored using EDAC.

Details:

1.) Our systems boot from an initrd, all system services are running from the
initrd/ramdisk.

2.) While setting up a lustre meta data storage server, lustre runs
mkfs.ext2 -j -b 4096 -F -i 4096 -J size=400 -I 512 /dev/sda4
(Please note, I first observed this while using a lustre patched kernel, but I
could reproduce this with vanilla kernels).


While this mkfs.ext2 command was running, suddenly running commands such as
ps, top, ls, etc. resulted in segmentation faults.

To see whats going on, I copied the entire / (so the initrd) into a tmpfs
root, chrooted into it, also bind mounted the main / into this chroot and
compared several times /bin of chroot/bin and the bind-mounted /bin while the
mkfs.ext2 command was running.

beo-05:/# diff -r /bin /oldroot/bin/
beo-05:/# diff -r /bin /oldroot/bin/
beo-05:/# diff -r /bin /oldroot/bin/
Binary files /bin/sleep and /oldroot/bin/sleep differ
beo-05:/# diff -r /bin /oldroot/bin/
Binary files /bin/bsd-csh and /oldroot/bin/bsd-csh differ
Binary files /bin/cat and /oldroot/bin/cat differ
...

Also tested different schedulers, at least happens with deadline and
anticipatory.

The corruption does NOT happen on running the mkfs command on /dev/sda1, but
happens with sda2, sda3 and sda3. Also doesn't happen with extended
partitions of sda1.

Any idea whats going on?


Thanks,
Bernd


--
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/