Re: [PATCH] ext4: xattr: fix out-of-bounds access in ext4_xattr_set_entry

From: Theodore Tso

Date: Fri Mar 20 2026 - 08:33:41 EST

On Fri, Mar 20, 2026 at 03:43:21PM +0800, ZhengYuan Huang wrote:
>
> There seem to be three layers of defense: fsck, mount-time checks, and
> runtime checks.

Within runtime checks, there are those checks that are done the first
time metadata is loaded from disk --- for example, see the checks in
__ext4_iget() and the functions it calls, such as check_igot_inode().

And then there are checks that are done in hotpaths, since at least in
theory, a stupid system administrator which makes a block device be
world-writeable and so a malicious or accidental actor could modify
the copy of the metadata in the buffer cache. Those are the ones
sorts of runtime checks we sould try to avoid.

Mount-time checks tend to be those that validate superblock and block
group descriptor contents. They can't validate all of the inodes
because that would take a lot longer.

> Would it be more accurate to understand the boundary
> this way: once the filesystem metadata has passed mount-time
> validation (even if it would not necessarily pass fsck), the
> filesystem is still expected to handle later errors gracefully rather
> than crash?

It is a nice to have that a file system, should handle errors
gracefully rather than crash. However, if the inconsistency would
have been caught and corrected by fsck, I don't consider it a
CVE-worthy security bug, but rather a quality-of-implementation bug.

This is important, because there are risks associated with rolling out
a new kernel to hundreds of thousands of machines, or using live
patching to fix high severity security bugs. If the issue could have
been caught by fsck, and a competently administered system *does* run
fsck at boot time (such as at $WORK), the cost benefit ratio of
treating such bugs as security bugs doesn't make sense.

> More specifically, for inconsistencies that arise at runtime, is the
> general expectation that they are outside the filesystem's
> responsibility and should instead be handled by other layers (for
> example, lower-level storage redundancy / RAID)? Or is there still
> room for defensive checks in the filesystem, as long as they are done
> outside hot paths?

This would be on a case by case basis. If the check is *super* cheap,
and it's done outside of a hotpath --- say, when a file is first
opened. And if doesn't cause long-term maintenance issues, it is
comething that could be considered. But in terms of the priority of
dealing with such patches, it is not something that would be
considered high priority. Perhaps just a step above spelling or
grammer fixes in comments. :-)

Consider that for a enterprise hard drives, the bit error rate is 1 in
10**15. And the chances that such as a bit error would be cause a
metadata inconsistency that would lead to a crash has to be factored
in. If we had infinite resources, it might be something that would be
considered higher priority, but in the real world, when the
opportunity cost of having software engineers working on other
improvements, it's not necessarily going to be a compelling business
case when I go to my management asking for more headcount. And if you
are an academic, perhaps the impact of such work might also be called
into question.

Cheers,

- Ted