Re: [BUG] ext4: delayed-free buddy load error reaches BUG_ON in ext4_process_freed_data
From: Jan Kara
Date: Mon Jun 01 2026 - 06:16:13 EST
Hello!
On Sun 24-05-26 11:14:43, Yifei Chu wrote:
> Short version: I am reporting an ext4 delayed-free error-path bug found
> with targeted fault injection. The injected -EIO is in
> ext4_mb_load_buddy()’s normal error-return domain, and the injection is
> placed at the helper return boundary for the delayed-free caller. With that
> rare lower-layer failure made deterministic, ext4_free_data_in_buddy()
> reaches BUG_ON(err != 0) and crashes the kernel.
So how did you inject the EIO error? ext4_mb_load_buddy() returns EIO only
if the folio is not uptodate. Did you manage to hit non-uptodate folio when
ext4_mb_load_buddy() is called from ext4_free_data_in_buddy()? Checking the
code it actually might be possible as I don't see where the buddy folio
would be actually pinned in memory these days but I'd like to to verify...
Honza
>
> Tested kernel:
>
> v7.1-rc4-640-g79bd2dded182
> 79bd2dded182b1d458b18e62684b7f82ffc682e5
> x86_64 QEMU, KASAN config
>
> The relevant code shape in fs/ext4/mballoc.c is:
>
> err = ext4_mb_load_buddy(sb, entry->efd_group, &e4b);
> /* we expect to find existing buddy because it’s pinned */
> BUG_ON(err != 0);
>
> The point of the injection is not to corrupt ext4 state. It only makes a
> plausible buddy/bitmap load failure deterministic at this caller, so the
> caller’s error handling can be tested. ext4_mb_load_buddy() already has
> normal negative error returns from metadata loading paths.
>
> Reproducer shape:
>
> 1. Mount a fresh ext4 filesystem.
> 2. Create and fsync a 256 KiB file.
> 3. Unlink the file.
> 4. Call sync() to force delayed-free processing.
> 5. The instrumentation forces ext4_mb_load_buddy() to return -EIO at the
> delayed-free callsite.
>
> Two fresh image runs reproduced the same crash:
>
> AGENT_INIT: unlink ret=0 errno=0 (Success)
> AGENT_INIT: calling sync to force delayed free processing
> EXT4-fs: AGENT_EXT4_FREE_DATA_BUDDY_BUGON: forcing ext4_mb_load_buddy EIO
> before BUG_ON
> kernel BUG at fs/ext4/mballoc.c:3990!
> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> RIP: 0010:ext4_process_freed_data+0x1fe/0x510
>
> I did a local duplicate sweep and found related older
> ext4_mb_load_buddy()/mballoc fixes, but I did not find a direct
> current-upstream fix for this delayed-free BUG_ON(err != 0) path.
>
> Expected behavior:
>
> A metadata load failure during delayed-free processing should go through
> ext4 error handling / transaction abort / filesystem error propagation,
> rather than treating the error as an impossible invariant and BUGing the
> kernel.
>
> The attached tarball includes README.md, repro_init.c,
> instrumentation.patch, and both full serial logs.
>
> Thanks,
> Chuyifei
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR