Re: [PATCH v2 1/3] blk-cgroup: fix race between policy activation and blkg destruction
From: Zheng Qixing
Date: Wed Jan 14 2026 - 22:27:54 EST
在 2026/1/14 18:40, Michal Koutný 写道:
On Tue, Jan 13, 2026 at 02:10:33PM +0800, Zheng Qixing <zhengqixing@xxxxxxxxxxxxxxx> wrote:Yes, this issue was discovered by injecting memory allocation failure at
From: Zheng Qixing <zhengqixing@xxxxxxxxxx>The rollback path is under q->queue_lock same like the list removal in
When switching an IO scheduler on a block device, blkcg_activate_policy()
allocates blkg_policy_data (pd) for all blkgs attached to the queue.
However, blkcg_activate_policy() may race with concurrent blkcg deletion,
leading to use-after-free and memory leak issues.
The use-after-free occurs in the following race:
T1 (blkcg_activate_policy):
- Successfully allocates pd for blkg1 (loop0->queue, blkcgA)
- Fails to allocate pd for blkg2 (loop0->queue, blkcgB)
- Enters the enomem rollback path to release blkg1 resources
T2 (blkcg deletion):
- blkcgA is deleted concurrently
- blkg1 is freed via blkg_free_workfn()
- blkg1->pd is freed
T1 (continued):
- Rollback path accesses blkg1->pd->online after pd is freed
blkg_free_workfn().
Why is queue_lock not enough for synchronization in this case?
(BTW have you observed this case "naturally" or have you injected the
memory allocation failure?)
->pd_alloc_fn(..., GFP_KERNEL) in blkcg_activate_policy().
In blkg_free_workfn(), q->queue_lock only protects the
list_del_init(&blkg->q_node). However, ->pd_free_fn() is called before
list_del_init(), meaning the pd is already freed before the blkg is removed
from the queue's list.
blkcg_activate_policy() blkg_free_workfn()
------------------- ------------------
spin_lock(&q->queue_lock)
...
if (!pd) {
spin_unlock(&q->queue_lock)
...
goto enomem
}
enomem:
spin_lock(&q->queue_lock)
if (pd) {
->pd_free_fn() // pd freed
pd->online // uaf
...
}
spin_lock(&q->queue_lock)
list_del_init(&blkg->q_node)
spin_unlock(&q->queue_lock)
- Triggers use-after-freeYeah, this looks weirdly reversed.
In addition, blkg_free_workfn() frees pd before removing the blkg from
q->blkg_list.
Commit f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") delays list_del_init(&blkg->q_node) until after pd_free_fn() in blkg_free_workfn(). This keeps blkgs visible in the queue list during policy deactivation, preventing parent policy data from being freed before child policy data and avoiding use-after-free.
Kind Regards,
Qixing