[BUG] btrfs: dev-replace start commit error reaches WARN_ON and panic_on_warn
From: Yifei Chu
Date: Sun May 24 2026 - 11:15:46 EST
Hello,
Short version: I am reporting a Btrfs dev-replace start error-path bug found with targeted fault injection. The injected -EIO is in btrfs_commit_transaction()’s normal error-return domain, and the injection is placed at the transaction-commit return boundary. With that rare commit failure made deterministic, BTRFS_IOC_DEV_REPLACE reaches WARN_ON(ret); with panic_on_warn=1 this panics the kernel.
Tested kernel:
v7.1-rc4-640-g79bd2dded182-dirty
commit base 79bd2dded182b1d458b18e62684b7f82ffc682e5
x86_64 QEMU, KASAN config
Reproducer shape:
The initramfs source mounts a single-device Btrfs image on /dev/vda and calls BTRFS_IOC_DEV_REPLACE to replace source devid 1 with an empty target block device /dev/vdb. The validation patch forces btrfs_commit_transaction() to return -EIO immediately after the dev-replace state commit path otherwise succeeds.
The point of the injection is not to corrupt Btrfs state arbitrarily. It makes a valid transaction-commit error deterministic at this caller, so the caller’s error handling can be tested.
I reproduced this twice under targeted fault injection. The signature is:
BTRFS info (device vda): dev_replace from /dev/vda (devid 1) to /dev/vdb started
BTRFS error (device vda): AGENT_BTRFS_DEV_REPLACE: forcing start commit EIO
WARNING: fs/btrfs/dev-replace.c:700 at btrfs_dev_replace_by_ioctl+0xe5b/0x1760
RIP: 0010:btrfs_dev_replace_by_ioctl+0xe5b/0x1760
Kernel panic - not syncing: kernel: panic_on_warn set …
I found an older related fix, 5c06147128fb (“btrfs: fix error handling in btrfs_dev_replace_start”), but that addressed a different start-failure path with invalid srcdev/tgtdev state. I did not find a direct fix for this post-commit WARN_ON(ret) site in current upstream.
Expected behavior:
A transaction commit error should be propagated through the ioctl path, with dev-replace state and target/source device lifetime audited, rather than treated as a warning-only invariant. The failure happens after in-memory dev-replace state has been set to STARTED, so a real fix probably needs more than simply deleting the WARN_ON().
The attached tarball includes README.md, repro_init.c, positive_instrumentation.diff, QEMU args/results, and both full serial logs.
Thanks,
Chuyifei
Attachment:
btrfs_dev_replace_commit_warn_panic_20260523.tar.gz
Description: Unix tar archive