Re: [PATCH] ext4: validate extent entries before caching in ext4_find_extent()

From: Deepanshu Kartikey

Date: Mon Sep 29 2025 - 10:00:45 EST


Zhang Yi,

Thank you for pointing out the validation issue and your concerns about redundant checks. Let me provide a complete explanation of what's happening.

## Initial Problem
The reproducer triggers a BUG_ON in ext4_es_cache_extent() when opening a verity file on a corrupted ext4 filesystem. The issue occurs because the extent tree contains out-of-order extents that cause integer underflow when calculating hole sizes.

## Why ext4_ext_check_inode() Doesn't Catch This
You correctly asked why the existing validation in __ext4_iget() doesn't catch this corruption. After investigation with debug code, I found:

DEBUG: verity inode 15, inline=0, extents=1, test_inode_flag_inline_data=1
DEBUG: First time check inode 15 - flag=1, i_inline_off=0, has_inline=0
DEBUG: Second time check inode 15 - flag=1, i_inline_off=164, has_inline=1
DEBUG: Skipping validation for inode 15 (has inline data)

The corrupted filesystem has inode 15 with:
1. EXT4_INODE_INLINE_DATA flag set (test_inode_flag returns 1)
2. EXT4_INODE_VERITY flag set (it's a verity file)
3. i_inline_off containing value 164 (from corrupted on-disk data)
4. Out-of-order extents in the extent tree

## The Validation Bypass Mechanism
The validation code in __ext4_iget():
} else if (!ext4_has_inline_data(inode)) {
/* validate the block references in the inode */

The ext4_has_inline_data() function returns:
return ext4_test_inode_flag(inode, EXT4_INODE_INLINE_DATA) &&
EXT4_I(inode)->i_inline_off;

Initially i_inline_off=0, so ext4_has_inline_data() returns false (1 && 0 = 0). But by the time validation check happens, i_inline_off=164 (loaded from corrupted on-disk data), making ext4_has_inline_data() return true (1 && 164 = 1), which skips validation.

## Proper Solution
You're correct that we should avoid redundant checks. The proper fix is to detect this invalid combination early in ext4_iget():

if (ext4_test_inode_flag(inode, EXT4_INODE_VERITY) &&
ext4_test_inode_flag(inode, EXT4_INODE_INLINE_DATA)) {
ext4_error_inode(inode, __func__, __LINE__, 0,
"inode has both verity and inline data flags");
ret = -EFSCORRUPTED;
goto bad_inode;
}

This addresses the root cause without adding overhead to the extent lookup path. I'll prepare a v2 patch implementing this approach instead of adding validation in ext4_find_extent().

Thank you for the thorough review that led to finding the actual root cause.

Best regards,
Deepanshu