Re: Nature of ext4 corruption fixed by recent patch?

From: Theodore Ts'o
Date: Tue May 19 2015 - 13:50:39 EST


On Tue, May 19, 2015 at 09:37:40AM -0700, Josh Triplett wrote:
> In particular, I didn't realize this was *only* the data of the
> delayed-extent-based files. The bug here seems to have struck various
> recently-written files and directories. (Recent in days, not seconds,
> as far as I can tell; and it isn't universal based on age.) The initial
> symptom was ext4 noticing that a directory was corrupt (truncated, IIRC)
> and immediately marking the whole filesystem read-only.

Do you have the transcript of fsck run on the file system? Either
with -n, or as you were trying to fix it? I'd need to know a lot more
about the pattern of corruptions to hazard a guess.

The sorts of corruption that turn into a large number of file system
errors are (a) corruptions in the block allocation bitmap, so blockes
get used for more than one purpose, or (b) garbage (or the wrong
portion of an inode table) getting written into the inode table. But
these all have their own distinctive signatures in terms of the file
system problems reported by e2fsck.

In general though this doesn't cause large number of files to contain
NULLs. though. So it doesn't smell like a file system problem, but
I'd want to see a detailed listing of the problems reported by e2fsck
before making a definitive statement.

Were you using LVM, raid, or anything else between the file system and
the storage device(s)?

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/