Re: [PATCH] btrfs: don't add delayed refs to an aborted transaction

From: Filipe Manana

Date: Sun Mar 01 2026 - 06:26:32 EST


On Sun, Mar 1, 2026 at 8:09 AM Adarsh Das <adarshdas950@xxxxxxxxx> wrote:
>
> When a transaction aborts, cleanup_transaction() calls
> btrfs_cleanup_one_transaction() which drains all pending delayed refs
> via btrfs_destroy_delayed_refs().
>
> But, btrfs_cleanup_one_transaction() then wakes up tasks waiting
> on transaction_blocked_wait and sets the transaction state to
> TRANS_STATE_UNBLOCKED. These woken tasks can then call btrfs_add_delayed_tree_ref(),

No, that's not correct.

You need to take into account transaction states.

Once a transaction is in the state TRANS_STATE_COMMIT_START, it means
no one else is holding a transaction handle for that transaction
(except the task calling btrfs_commit_transaction()).
Also no one else is able to start a new transaction until the current
transaction state is >= TRANS_STATE_UNBLOCKED.
Take a look at the array btrfs_blocked_trans_types, join_transaction()
and start_transaction().

We call cleanup_transaction() only in btrfs_commit_transaction(), if
some error happens, after the state is set to
TRANS_STATE_COMMIT_START.
Once we call cleanup_transaction() we call btrfs_abort_transaction()
and mark the filesystem in error state, setting fs_info->fs_error
among other things.
So after btrfs_cleanup_one_transaction() goes through the remaining
transaction states and does the wakeups, no one can start a new
transaction - join_transaction() sees that fs_info->fs_error is not 0
(through BTRFS_FS_ERROR()) and fails.

> btrfs_add_delayed_data_ref(), or btrfs_add_delayed_extent_op() on the
> already-aborted transaction, inserting new entries into the head_refs
> xarray after it was just drained.

Nop... if the transaction was aborted by cleanup_transaction(), it
means that, except for the task calling btrfs_commit_transaction(), no
one is holding a handle for that transaction.

Think about how chaotic it would be if every piece of code holding a
transaction handle had to check if the transaction was aborted before
doing anything...
We would have not only to avoid adding delayed refs but pretty much anything.

Thanks.

>
> When btrfs_put_transaction() subsequently drops the refcount to zero, it
> hits:
>
> WARN_ON(!xa_empty(&transaction->delayed_refs.head_refs));
>
> This patch fixes this by checking TRANS_ABORTED() at the start of add_delayed_ref()
> and btrfs_add_delayed_extent_op() before inserting into the xarray.
> btrfs_abort_transaction() is called at the start of cleanup_transaction(),
> before btrfs_destroy_delayed_refs(), so the aborted flag should always be set
> before any wakeups occur.
>
> Reported-by: syzbot+6d30e046bbd449d3f6f1@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Adarsh Das <adarshdas950@xxxxxxxxx>
> ---
> fs/btrfs/delayed-ref.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 3766ff29fbbb..b994f9702c32 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -327,7 +327,7 @@ static int cmp_refs_node(const struct rb_node *new, const struct rb_node *exist)
> return comp_refs(new_node, exist_node, true);
> }
>
> -static struct btrfs_delayed_ref_node* tree_insert(struct rb_root_cached *root,
> +static struct btrfs_delayed_ref_node *tree_insert(struct rb_root_cached *root,
> struct btrfs_delayed_ref_node *ins)
> {
> struct rb_node *node = &ins->ref_node;
> @@ -1025,6 +1025,10 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
> }
>
> delayed_refs = &trans->transaction->delayed_refs;
> + if (TRANS_ABORTED(trans->transaction)) {
> + ret = -EIO;
> + goto free_head_ref;
> + }
>
> if (btrfs_qgroup_full_accounting(fs_info) && !generic_ref->skip_qgroup) {
> record = kzalloc_obj(*record, GFP_NOFS);
> @@ -1153,6 +1157,10 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans,
> head_ref->extent_op = extent_op;
>
> delayed_refs = &trans->transaction->delayed_refs;
> + if (TRANS_ABORTED(trans->transaction)) {
> + kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
> + return -EIO;
> + }
>
> ret = xa_reserve(&delayed_refs->head_refs, index, GFP_NOFS);
> if (ret) {
> --
> 2.53.0
>
>