Re: [PATCH] sched/fair: Use rq->lock when checking cfs_rq list presence

From: Michal Koutný
Date: Wed Oct 13 2021 - 10:26:51 EST


On Wed, Oct 13, 2021 at 09:57:17AM +0200, Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
> Furthermore, list_del_leaf_cfs_rq() starts with the same test on of
> cfs_rq->on_list.

Yes, the same check but synchronized with rq->lock.

> The problem is that the cfs_rq can be added during or
> after the test. Removing it should not be enough because we do the
> same test under rq lock which only ensures that both the test and the
> add on the list will not happen simultaneously.

This is what I overlooked when I was looking for explanation of the UAF
on the leaf list.

> This seems to closes the race window in your case but this could still
> happen AFAICT.

You seem to be right.
Hopefully, I'll be able to collect more data evaluating this.

> What about your patchset about adding a cfs in the list only when
> there is a runnable task ?

The patches I had sent previously [1] avoid adding cfs_rq to the list
when it's under a throttled ancestor (namely 4/5). The runnable
condition is rather orthogonal. (Not sure it's the patchset you were
referring to.)


> Wouldn't this fix the problem ?

FWIW, the "reliable" fix so far is a revert of the commit a7b359fc6a37
("sched/fair: Correctly insert cfs_rq's to list on
unthrottle"). Therefore my hypothesis about racy adding from
tg_unthrottle_up(), so I think the other patches won't affect the issue.

Thanks for your feedback. Let me examine the problem some more before
continuing with this patch.

Michal


[1] https://lore.kernel.org/all/20210819175034.4577-1-mkoutny@xxxxxxxx/