Re: Error testing ext3 on brd ramdisk

From: Nick Piggin
Date: Tue Mar 10 2009 - 12:13:04 EST


On Thu, Mar 05, 2009 at 01:12:25PM +0100, Jorge Boncompte [DTI2] wrote:
> Jorge Boncompte [DTI2] escribió:
> >Ok. I have modified the script to do...
> >------------
> >mount -no remount,ro /dev/ram0
> >dd if=/dev/ram0 of=/tmp/config.bin bs=1k count=1000
> >fsck.minix -fv /tmp/config.bin
> >if [ $? != 0 ]; then
> > echo "FATAL: Filesystem is corrupted"
> > exit 2
> >fi
> >mount -no remount,rw /dev/ram0
> >md5sum config.bin
> >dd if=config.bin of=/dev/hda1
> >echo $md5sum | dd of=/dev/hda1 bs=1k seek=1100 count=32
> >------------
> >... after some cycles of modifying files on the filesystem and trying to
> >save it to disk...
> >------------------
> >fsck.minix: BusyBox v1.8.2 (2008-12-03 14:24:56 CET)
> >Forcing filesystem check on /tmp/config.bin
> >Unused inode 198 is marked as 'used' in the bitmap.
> >Zone 393 is marked 'in use', but no file uses it.
> >Zone 394 is marked 'in use', but no file uses it.
> >
> > 198 inodes used (58%)
> > 395 zones used (39%)
> >
> > 163 regular files
> > 24 directories
> > 0 character device files
> > 0 block device files
> > 0 links
> > 10 symbolic links
> >------
> > 197 files
> >FATAL: Filesystem is corrupted
> >-------------------
> >
>
> If after getting the "FATAL: Filesystem is corrupted" message I do
> "echo 3 > /proc/sys/vm/drop_caches" and rerun the script the filesystem
> somehow got magically fixed (I mean fsck.minix does not report errors
> and the image gets written to disk well ;-)

OK, I can reproduce this. It really does seem to be due to buffercache
going out of coherency for some reason, so the trick is that you have
to fsck it while you have it mounted ro before remounting rw then modifying
it then remounting ro and fscking again (the first fsck must bring in
uptodate buffercache, and something is not being correctly invalidated).

It is also not brd or minix specific -- I reproduced it with loop driver
and ext2 too, and probably regular disk driver will have the same problem
(ie. it is something in the buffercache).

I don't know if this is the same problem as the ext3 issue -- the recipe
for reproducing ext3 problem includes umount, which will invalidate all
the buffercache unless something is holding the bdev open. But anyway I
am making some progress with this problem so I will try solve it first.

I can't think of any good reason why buffercache should be going out of
sync here...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/