Re: [BUG] btrfs: dev-replace finishing commit error reaches WARN_ON and panic_on_warn

From: Qu Wenruo

Date: Sun May 24 2026 - 18:18:28 EST




在 2026/5/25 00:45, Yifei Chu 写道:
Hello,

Short version: I am reporting a second Btrfs dev-replace error-path bug found with targeted fault injection, this time in the finishing path. The injected -EIO is in btrfs_commit_transaction()’s normal error-return domain, and the injection is placed at the transaction-commit return boundary. With that rare commit failure made deterministic, btrfs_dev_replace_finishing() reaches WARN_ON(ret); with panic_on_warn=1 this panics the kernel.

Tested kernel:

v7.1-rc4-640-g79bd2dded182-dirty
commit base 79bd2dded182b1d458b18e62684b7f82ffc682e5
x86_64 QEMU, KASAN config

Reproducer shape:

The initramfs source mounts a single-device Btrfs image on /dev/vda and starts device replace from source devid 1 to an empty target device / dev/vdb. The validation patch forces the commit in btrfs_dev_replace_finishing() to return -EIO.

The point of the injection is not arbitrary state corruption. It makes a valid transaction-commit error deterministic at this caller, so the finishing path’s error handling can be tested.

I reproduced this twice under targeted fault injection. The signature is:

BTRFS info (device vda): dev_replace from /dev/vda (devid 1) to /dev/vdb started
BTRFS error (device vda): AGENT_BTRFS_DEV_REPLACE_FINISH: forcing finish commit EIO
WARNING: fs/btrfs/dev-replace.c:912 at btrfs_dev_replace_finishing+0x295/0x13a0
RIP: 0010:btrfs_dev_replace_finishing+0x295/0x13a0
Kernel panic - not syncing: kernel: panic_on_warn set …

This looks related in theme to the dev-replace start-path error handling, but it is a separate callsite and a different state surface: the commit happens after scrub returns and before mapping-tree/device- list updates. I did not find a direct current-upstream fix for this finishing-phase WARN_ON(ret) site in my local duplicate sweep.

Expected behavior:

The commit error should be propagated and the replace state should be left consistent, rather than treating the error as a warning-only invariant. A real fix likely needs to audit target/source device lifetime and replace state around this finishing path.

The attached tarball includes README.md, repro_init.c, positive_instrumentation.diff, QEMU args/results, and both full serial logs.

Your attachment looks very malicious.

Firstly the tar.gz is way larger than the only file inside the tar.gz.

Secondly the only file inside that archive is a hidden file, named "._btrfs_dev_replace_finishing_commit_warn_panic_20260523".

The file is hidden is already suspicious, and "file" tells me it's a "AppleDouble encoded Macintosh file".

Nothing matches your description.

You have a lot of things to explain.


Thanks,
Chuyifei