[RFC] ext4: possible inconsistency in ext4_append() error path
From: Vineet Agarwal
Date: Fri May 01 2026 - 13:25:45 EST
Hi,
While looking into ext4 directory operations, I noticed a possible
inconsistency in the error handling of ext4_append().
In ext4_append(), the inode size is updated before all failure points
have been ruled out:
bh = ext4_bread(handle, inode, *block, EXT4_GET_BLOCKS_CREATE);
if (IS_ERR(bh))
return bh;
inode->i_size += inode->i_sb->s_blocksize;
EXT4_I(inode)->i_disksize = inode->i_size;
err = ext4_mark_inode_dirty(handle, inode);
if (err)
goto out;
err = ext4_journal_get_write_access(handle, inode->i_sb, bh,
EXT4_JTR_NONE);
if (err)
goto out;
If either ext4_mark_inode_dirty() or
ext4_journal_get_write_access() fails, the function returns an
error but does not restore the original inode size.
Callers of ext4_append() appear to treat it as an all-or-nothing
operation:
bh = ext4_append(handle, dir, &block);
if (IS_ERR(bh))
goto out;
However, in the failure case, inode->i_size may already have been
increased.
One possible consequence is that subsequent checks relying on i_size,
such as:
if (block >= inode->i_size >> inode->i_blkbits)
may allow a block index to pass bounds checks even though the append
operation did not complete successfully.
I understand that journaling may ensure on-disk consistency, but the
in-memory inode state may still temporarily reflect a change that did
not logically succeed.
Is this behavior intentional, or should ext4_append() avoid updating
i_size until after all failure points, or roll it back on error?
Thanks,
Vineet Agarwal