[BUG] btrfs: seed add-device sys-chunk relocation error reaches BUG_ON
From: Yifei Chu
Date: Sun May 24 2026 - 11:18:02 EST
Hello,
Short version: I am reporting a Btrfs seed add-device error-path bug found with targeted fault injection. The injected -EIO is in the normal error-return domain of btrfs_relocate_chunk(), and the injection is placed at the relocation helper return boundary for btrfs_relocate_sys_chunks(). With that non-ENOSPC relocation failure made deterministic, the seed add-device path reaches BUG_ON(ret).
Tested kernel:
v7.1-rc4-640-g79bd2dded182
commit base 79bd2dded182b1d458b18e62684b7f82ffc682e5
x86_64 QEMU, KASAN config
Relevant code shape:
ret = btrfs_relocate_chunk(fs_info, found_key.offset, true);
if (ret == -ENOSPC)
failed++;
else
BUG_ON(ret);
Reproducer shape:
The host creates a Btrfs seed image, marks it seeding with btrfstune -S 1, and boots QEMU with that seed image plus a blank second block device. Guest PID1 mounts /dev/vda as Btrfs and calls BTRFS_IOC_ADD_DEV with /dev/vdb, entering btrfs_init_new_device() and btrfs_relocate_sys_chunks().
The preferred validation patch forces an early -EIO return from btrfs_relocate_chunk() on this seed add-device system-chunk relocation path. The point of the injection is not to corrupt relocation state after the fact; it makes a valid non-ENOSPC relocation error deterministic at this caller.
Observed result:
BTRFS error (device vda): AGENT_BTRFS_SYS_CHUNK: forcing early btrfs_relocate_chunk EIO chunk=201523200
kernel BUG at fs/btrfs/volumes.c:3698!
RIP: 0010:btrfs_init_new_device+0x2915/0x39d0
Kernel panic - not syncing: Fatal exception
I did an initial local duplicate sweep and found older related Btrfs relocation/error-handling fixes, but I did not find a direct current-upstream fix for this seed add-device btrfs_relocate_sys_chunks() non-ENOSPC BUG_ON(ret) case.
Expected behavior:
A non-ENOSPC relocation error should be propagated through the add-device path instead of treated as an impossible invariant. One caution: a naive control that simply returns non-ENOSPC errors instead of BUG_ON() avoids the invalid-opcode crash, but exposes KASAN use-after-free reports during existing add-device error cleanup. That suggests the real fix likely needs both structured error propagation and a cleanup audit after btrfs_relocate_sys_chunks() fails, because the new device has already been partially linked into filesystem state.
The attached tarball includes README.md, repro_init.c, preferred early-positive instrumentation, control diffs, and serial logs.
Thanks,
Chuyifei
Attachment:
btrfs_seed_add_sys_chunk_eio_bugon_20260523.tar.gz
Description: Unix tar archive