Re: ext3 corruption

From: Helge Hafting
Date: Mon Sep 25 2006 - 08:36:14 EST


Molle Bestefich wrote:
I wrote:
I have a ~1TB filesystem that failed to mount today, the message is:

EXT3-fs error (device loop0): ext3_check_descriptors: Block bitmap for
group 2338 not in group (block 1607003381)!
EXT3-fs: group descriptors corrupted !

Yesterday it worked flawlessly.

Helge Hafting wrote:
> And voila, that difficult task of assessing in which order to do
> things is out of the hands of distros like Red Hat, and into the
> hands of those people who actually make the binaries.

Not so easy. You do not want to shut down md devices because
samba is using them. Someone else may run samba on a single
harddisk and also have some md-devices that they take down
and bring up a lot. So having samba generally depend on md doesn't
work. Your setup need it, others may have different needs.

I've looked hard at things and just found that maybe it's not the init
order that's to blame..

It seems that unmounting the filesystem fails with a "device busy" error.
I'm not sure why there's still open files on the device, but perhaps a
remote user is copying a file or some such (likely).
That is solvable by shutting down remote operations first.
So stop samba (or nfs or whatever) before attempting to umount.
Anyway, the system is shutting down, so it should just forcefully
unmount the device, but it doesn't.
The halt script tries "umount" three times, which all fail with:
"device is busy".
It then actually tries "umount -f" three times, which all fail with
"Device or resource busy"
At which point the halt script turns off the machine and the
filesystem is ruined.

How to fix forceful unmount so it works?
I don't know, other than researching what filesystems support
forced umount and use one of those. Complain to the vendor or maintainer
of your particular filesystem.

However, you can usually find out why some file is open. Try
umount yourself, when it doesn't work, use "lsof" to see
what file is open. Then figure out who or what is keeping it open.
To debug a shutdown problem, consider putting "lsof >> logfile"
in your shutdown script.

Not a solution but a workaround: Run "sync" before shutdown.
(Stick it in some script.)
Now, all data in filesystem caches will be written to disk before power is lost.
This isn't perfect, but filesystem damage is greatly minimized and
often avoided completely. Useful while waiting for a better solution.

The real solution is to set things up so unforced umount works.
This is normally possible to do.


Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/