Re: [PATCH v2 2/4] blk-cgroup: fix race between policy activation and blkg destruction

From: yu kuai

Date: Sat Jun 27 2026 - 00:15:00 EST

Hi,

在 2026/6/26 14:12, Nilay Shroff 写道:
> On 6/26/26 7:22 AM, yu kuai wrote:
>> Hi,
>>
>> 在 2026/6/26 9:50, yu kuai 写道:
>>> Hi,
>>>
>>> 在 2026/6/25 23:08, Nilay Shroff 写道:
>>>> On 6/24/26 12:16 PM, Yu Kuai wrote:
>>>>> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
>>>>> index 7baccfb690fe..f7e788a7fe95 100644
>>>>> --- a/block/blk-cgroup.c
>>>>> +++ b/block/blk-cgroup.c
>>>>> @@ -1563,10 +1563,12 @@ int blkcg_activate_policy(struct gendisk
>>>>> *disk, const struct blkcg_policy *pol)
>>>>>       if (WARN_ON_ONCE(!pol->pd_alloc_fn || !pol->pd_free_fn))
>>>>>           return -EINVAL;
>>>>>       if (queue_is_mq(q))
>>>>>           memflags = blk_mq_freeze_queue(q);
>>>>> +
>>>>> +    mutex_lock(&q->blkcg_mutex);
>>>>> retry:
>>>>>       spin_lock_irq(&q->queue_lock);
>>>>>       /* blkg_list is pushed at the head, reverse walk to
>>>>> initialize parents first */
>>>>>       list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
>>>>> @@ -1625,10 +1627,11 @@ int blkcg_activate_policy(struct gendisk
>>>>> *disk, const struct blkcg_policy *pol)
>>>>>       __set_bit(pol->plid, q->blkcg_pols);
>>>>>       ret = 0;
>>>>>       spin_unlock_irq(&q->queue_lock);
>>>>> out:
>>>>> +    mutex_unlock(&q->blkcg_mutex);
>>>>>       if (queue_is_mq(q))
>>>>>           blk_mq_unfreeze_queue(q, memflags);
>>>>>       if (pinned_blkg)
>>>>>           blkg_put(pinned_blkg);
>>>>>       if (pd_prealloc)
>>>> If the policy allocation fails, we jump to the lable enomem: and
>>>> teardown pds.
>>>> But I see this path still only acquires ->queue_lock. Don't we also
>>>> need
>>>> to protect it with ->blkcg_mutex?
>>> Yes, I agree we should protect it as well.
>>
>> Just take a closer look at the code, the enomem is already protected by
>> blkcg_mutex :)
>>
>
> Oh yes, but the ->blkcg_mutex is never released if we jump to enomem.
> So that may potentially cause deadlock. We need to release ->blkcg_mutex
> once blkcg_policy_teardown_pds() returns. Or may be refactor code (or add
> comment) so that it's easy to realize or spot the ->blkcg_mutex is
> acquired
> and then released around blkcg_policy_teardown_pds().

the enomem will goto out at last, and blkcg_mutex do released. The code is
a bit hacky.

>
>>>
>>>> Moreover I still see race between blkg insertion in blkg_create()
>>>> which
>>>> still doesn't use ->blkcg_mutex and so list traversal in
>>>> bfq_end_wr_async()
>>>> may still race with blkg_create(), isn't it? I remember you once told
>>>> this will be handled in another series but I couldn't find that yet.
>>> This is the set:
>>>
>>> [PATCH 0/8] blk-cgroup: remove queue_lock nesting from blkcg paths - Yu
>>> Kuai <https://lore.kernel.org/all/cover.1780621988.git.yukuai@xxxxxxx/>
>>>
>>> Noted that set just make sure queue_lock is not nested under other
>>> atomic
>>> context, and that set do not acquire blkcg_mutex for blkg_create()
>>> yet. Howerver,
>>> with that set it'll be easy to convert all queue_lock to blkcg_mtuex
>>> for blkg
>>> protection, and together with lots of blk-cgroup code cleanups.
>>>
>
> Okay, so are you planning to send a follow-up patchset that replaces
> ->queue_lock
> with ->blkcg_mutex for protecting the blkg_list? If so, I'd still
> prefer acquiring
> ->blkcg_mutex around blkg_create() in this patchset. That would
> address the race
> between blkg_create() and the blkg_list traversal in
> bfq_end_wr_async(), while the
> subsequent series can focus on cleaning up and removing the remaining
> ->queue_lock
> usage for blkg protection.

Yes, there is a follow-up patchset. When this set was posted, blkg_create is still
called with queue_lock held, so I can't do that. However, not that the other set
is already applied, I can hold blkcg_mutex for blkg_create() now.

>
> Thanks,
> --Nilay
>
--
Thanks,
Kuai