Re: [PATCH] blk-cgroup: check blkcg policy is enabled in blkg_create()

From: yukuai (C)
Date: Mon Oct 11 2021 - 21:09:15 EST


On 2021/10/11 23:23, Michal Koutný wrote:
Hello.

On Fri, Oct 08, 2021 at 03:27:20PM +0800, Yu Kuai <yukuai3@xxxxxxxxxx> wrote:
This is because blkg_alloc() is called from blkg_conf_prep() without
holding 'q->queue_lock', and elevator is exited before blkg_create():
IIUC the problematic interleaving is this one (I've noticed `blkg->pd[i]
= NULL` to thread 2 call trace):

The new blkg will not add to blkg_list untill pd_init_fn() is done in
blkg_create(), thus blkcg_deactivate_policy() can't access this blkg.

thread 1 thread 2
blkg_conf_prep
spin_lock_irq(&q->queue_lock);
blkg_lookup_check -> return NULL
spin_unlock_irq(&q->queue_lock);

blkg_alloc
blkcg_policy_enabled -> true
pd = ->pd_alloc_fn
blk_mq_exit_sched
bfq_exit_queue
blkcg_deactivate_policy
spin_lock_irq(&q->queue_lock);
__clear_bit(pol->plid, q->blkcg_pols);

pol->pd_free_fn(blkg->pd[i]);
blkg->pd[i] = NULL;

spin_unlock_irq(&q->queue_lock);
q->elevator = NULL;
blkg->pd[i] = pd
spin_lock_irq(&q->queue_lock);
blkg_create
if (blkg->pd[i])
->pd_init_fn -> q->elevator is NULL
spin_unlock_irq(&q->queue_lock);

In high-level terms, is this a race between (blk)io controller attribute
write and a device scheduler (elevator) switch?
If so, I'd add it to the commit message.

Fix the problem by checking that policy is still enabled in
blkg_create().

Is this sufficient wrt some other q->elevator users later?

@@ -252,6 +266,9 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
goto err_free_blkg;
}

I'd add a comment here like:

Re-check policies are still enabled, since the caller blkg_conf_prep()
temporarily drops q->queue_lock and we can race with
blk_mq_exit_sched() removing policies.

Thanks for your advice.

Best regards,
Kuai

+ if (new_blkg)
+ blkg_check_pd(q, new_blkg);
+

Thanks,
Michal
.