Re: Mild filesystem corruption on ext4 (no journal)

From: Alan Jenkins
Date: Fri Jun 05 2009 - 10:49:56 EST


Aioanei Rares wrote:
Alan Jenkins wrote:
Hi,

I run ext4 without a journal on my cheap netbook with a 4 gig SSD. I suspect "without a journal" is significant, I don't think I'm doing anything else strange.

When I upgrade libc from 2.7 (debian stable) to 2.9 (debian unstable), the locale breaks every reboot, and I have to repair it by running locale-gen. This happened now when I only upgraded libc, in order to play with signalfd(). It also happened before, when I upgraded the entire machine to debian unstable (which I later reverted).

The problem is that /usr/lib/locale/locale-archive gets corrupted when I reboot. The exact corruption differs with each reboot (i.e. the md5sum differs). Last time, the first ~70K was overwritten with data from xorg.log and my web browsing history. I have copies of the original and corrupted state which I can send, the full file is 1.3 megs, but I can limit it to the first 70K, since that's all that was corrupted.

To try and rule out a faulty userspace program, I marked the file as read-only (chmod a-w) and immutable (chattr +i). After a reboot, the file was still read-only and immutable, yet it still became corrupted.

Also, I ran md5sum in the shutdown scripts, after mounting the root filesystem read-only (which is also preceeded by a sync in a different script). This showed that the file did not appear corrupted at this point. (Though maybe it was ok in page-cache, but corrupted on-disk).

The locale-archive file is read by the libc locale routines using mmap(). The mapping is read only and is not modified. It seems likely that some process has it mapped when the kernel shuts down.

I tried reproducing this by writting a minimal daemon which maps a copy of the locale-archive file, and starting it just before the filesystem is remounted read-only. It didn't work though; this copy of the locale-archive file remained uncorrupted.

I forced a fsck on boot, and the filesystem was reported to be clean. I am currently running with e2fsprogs v1.41.6 (from debian unstable), and a custom-built kernel, 2.6.30-rc7.

Thanks in advance!
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

I suspect, although I might be wrong, that this is not a kernel-related
problem.

"To try and rule out a faulty userspace program, I marked the file as read-only (chmod a-w) and immutable (chattr +i). After a reboot, the file was still read-only and immutable, yet it still became corrupted."

Since the immutable bit is not respected, I tend to think it is a kernel problem. Unless the filesystem isn't getting unmounted/flushed properly for some reason... but I thought the modern kernel had that covered.

I agree it is very suspicious this happens only after upgrading libc. I'll see if I can find an individual change in libc locale-handling that might trigger this.

Thanks
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/