Re: [PATCH v2 1/3] blk-cgroup: fix race between policy activation and blkg destruction

From: Zheng Qixing

Date: Wed Jan 14 2026 - 22:27:54 EST



在 2026/1/14 18:40, Michal Koutný 写道:
On Tue, Jan 13, 2026 at 02:10:33PM +0800, Zheng Qixing <zhengqixing@xxxxxxxxxxxxxxx> wrote:
From: Zheng Qixing <zhengqixing@xxxxxxxxxx>

When switching an IO scheduler on a block device, blkcg_activate_policy()
allocates blkg_policy_data (pd) for all blkgs attached to the queue.
However, blkcg_activate_policy() may race with concurrent blkcg deletion,
leading to use-after-free and memory leak issues.

The use-after-free occurs in the following race:

T1 (blkcg_activate_policy):
- Successfully allocates pd for blkg1 (loop0->queue, blkcgA)
- Fails to allocate pd for blkg2 (loop0->queue, blkcgB)
- Enters the enomem rollback path to release blkg1 resources

T2 (blkcg deletion):
- blkcgA is deleted concurrently
- blkg1 is freed via blkg_free_workfn()
- blkg1->pd is freed

T1 (continued):
- Rollback path accesses blkg1->pd->online after pd is freed
The rollback path is under q->queue_lock same like the list removal in
blkg_free_workfn().
Why is queue_lock not enough for synchronization in this case?

(BTW have you observed this case "naturally" or have you injected the
memory allocation failure?)

Yes, this issue was discovered by injecting memory allocation failure at
->pd_alloc_fn(..., GFP_KERNEL) in blkcg_activate_policy().

In blkg_free_workfn(), q->queue_lock only protects the
list_del_init(&blkg->q_node). However, ->pd_free_fn() is called before
list_del_init(), meaning the pd is already freed before the blkg is removed
from the queue's list.

    blkcg_activate_policy()                  blkg_free_workfn()
    -------------------                          ------------------
    spin_lock(&q->queue_lock)
    ...
    if (!pd) {
        spin_unlock(&q->queue_lock)
        ...
        goto enomem
    }
    enomem:
        spin_lock(&q->queue_lock)
        if (pd) {
->pd_free_fn()  // pd freed
           pd->online // uaf
        ...
        }
spin_lock(&q->queue_lock)
list_del_init(&blkg->q_node)
spin_unlock(&q->queue_lock)
- Triggers use-after-free

In addition, blkg_free_workfn() frees pd before removing the blkg from
q->blkg_list.
Yeah, this looks weirdly reversed.

Commit f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") delays list_del_init(&blkg->q_node) until after pd_free_fn() in blkg_free_workfn(). This keeps blkgs visible in the queue list during policy deactivation, preventing parent policy data from being freed before child policy data and avoiding use-after-free.

Kind Regards,
Qixing