Re: [PATCH v2 2/4] blk-cgroup: fix race between policy activation and blkg destruction

From: Nilay Shroff

Date: Mon Jun 29 2026 - 01:34:33 EST

On 6/27/26 9:43 AM, yu kuai wrote:

Hi,

在 2026/6/26 14:12, Nilay Shroff 写道:

On 6/26/26 7:22 AM, yu kuai wrote:

Hi,

在 2026/6/26 9:50, yu kuai 写道:

Hi,

在 2026/6/25 23:08, Nilay Shroff 写道:

On 6/24/26 12:16 PM, Yu Kuai wrote:

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 7baccfb690fe..f7e788a7fe95 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1563,10 +1563,12 @@ int blkcg_activate_policy(struct gendisk
*disk, const struct blkcg_policy *pol)
      if (WARN_ON_ONCE(!pol->pd_alloc_fn || !pol->pd_free_fn))
          return -EINVAL;
      if (queue_is_mq(q))
          memflags = blk_mq_freeze_queue(q);
+
+    mutex_lock(&q->blkcg_mutex);
retry:
      spin_lock_irq(&q->queue_lock);
      /* blkg_list is pushed at the head, reverse walk to
initialize parents first */
      list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
@@ -1625,10 +1627,11 @@ int blkcg_activate_policy(struct gendisk
*disk, const struct blkcg_policy *pol)
      __set_bit(pol->plid, q->blkcg_pols);
      ret = 0;
      spin_unlock_irq(&q->queue_lock);
out:
+    mutex_unlock(&q->blkcg_mutex);
      if (queue_is_mq(q))
          blk_mq_unfreeze_queue(q, memflags);
      if (pinned_blkg)
          blkg_put(pinned_blkg);
      if (pd_prealloc)

If the policy allocation fails, we jump to the lable enomem: and
teardown pds.
But I see this path still only acquires ->queue_lock. Don't we also
need
to protect it with ->blkcg_mutex?

Yes, I agree we should protect it as well.

Just take a closer look at the code, the enomem is already protected by
blkcg_mutex :)

Oh yes, but the ->blkcg_mutex is never released if we jump to enomem.
So that may potentially cause deadlock. We need to release ->blkcg_mutex
once blkcg_policy_teardown_pds() returns. Or may be refactor code (or add
comment) so that it's easy to realize or spot the ->blkcg_mutex is
acquired
and then released around blkcg_policy_teardown_pds().

the enomem will goto out at last, and blkcg_mutex do released. The code is
a bit hacky.

Moreover I still see race between blkg insertion in blkg_create()
which
still doesn't use ->blkcg_mutex and so list traversal in
bfq_end_wr_async()
may still race with blkg_create(), isn't it? I remember you once told
this will be handled in another series but I couldn't find that yet.

This is the set:

[PATCH 0/8] blk-cgroup: remove queue_lock nesting from blkcg paths - Yu
Kuai <https://lore.kernel.org/all/cover.1780621988.git.yukuai@xxxxxxx/>

Noted that set just make sure queue_lock is not nested under other
atomic
context, and that set do not acquire blkcg_mutex for blkg_create()
yet. Howerver,
with that set it'll be easy to convert all queue_lock to blkcg_mtuex
for blkg
protection, and together with lots of blk-cgroup code cleanups.

Okay, so are you planning to send a follow-up patchset that replaces
->queue_lock
with ->blkcg_mutex for protecting the blkg_list? If so, I'd still
prefer acquiring
->blkcg_mutex around blkg_create() in this patchset. That would
address the race
between blkg_create() and the blkg_list traversal in
bfq_end_wr_async(), while the
subsequent series can focus on cleaning up and removing the remaining
->queue_lock
usage for blkg protection.

Yes, there is a follow-up patchset. When this set was posted, blkg_create is still
called with queue_lock held, so I can't do that. However, not that the other set
is already applied, I can hold blkcg_mutex for blkg_create() now.

If you already have a follow-up patchset that replaces ->queue_lock with
->blkcg_mutex for protecting blkg_list, then I think it might make sense
to send that out first (if you haven't already) and hold off this series for
now. That way, the blkg_create() race can be addressed first, and this
series can build on top of those changes.

BTW, if that follow-up series is merged first, I suspect the first two patches in
this series may no longer be necessary, leaving only patches 3/4 and 4/4.

Thanks,
--Nilay