Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

From: Nix
Date: Wed Oct 24 2012 - 16:34:50 EST


On 24 Oct 2012, Eric Sandeen uttered the following:

> On 10/24/2012 02:49 PM, Nix wrote:
>> On 24 Oct 2012, Theodore Ts'o spake thusly:
>>> Toralf, Nix, if you could try applying this patch (at the end of this
>>> message), and let me know how and when the WARN_ON triggers, and if it
>>> does, please send the empty_bug_workaround plus the WARN_ON(1) report.
>>> I know about the case where a file system is mounted and then
>>> immediately unmounted, but we don't think that's the problematic case.
>>> If you see any other cases where WARN_ON is triggering, it would be
>>> really good to know....
>>
>> Confirmed, it triggers. Traceback below.
>
> <giant snip>
>
> The warn on triggers, but I can't tell - did the corruption still occur
> with Ted's patch?

Yes. I fscked the filesystems in 3.6.1 after rebooting: /var had a
journal replay, and the usual varieties of corruption (free space bitmap
problems and multiply-claimed blocks). (The other filesystems for which
the warning triggered had neither a journal replay nor corruption.
At least one of them, /home, likely had a few writes but not enough to
cause a journal wrap.)

I note that the warning may well *not* have triggered for /var: if the
reason it had a journal replay was simply that it was still in use by
something that hadn't died, the umount -l will have avoided doing a full
umount for that filesystem alone.

Also, the corrupted filesystem was mounted in 3.6.3 exactly once.
Multiple umounts are not necessary, but an unclean umount apparently is.

--
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/