Re: [PATCH] sched/fair: Use rq->lock when checking cfs_rq list presence
From: Michal Koutný
Date: Wed Oct 13 2021 - 10:26:51 EST
On Wed, Oct 13, 2021 at 09:57:17AM +0200, Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
> Furthermore, list_del_leaf_cfs_rq() starts with the same test on of
> cfs_rq->on_list.
Yes, the same check but synchronized with rq->lock.
> The problem is that the cfs_rq can be added during or
> after the test. Removing it should not be enough because we do the
> same test under rq lock which only ensures that both the test and the
> add on the list will not happen simultaneously.
This is what I overlooked when I was looking for explanation of the UAF
on the leaf list.
> This seems to closes the race window in your case but this could still
> happen AFAICT.
You seem to be right.
Hopefully, I'll be able to collect more data evaluating this.
> What about your patchset about adding a cfs in the list only when
> there is a runnable task ?
The patches I had sent previously [1] avoid adding cfs_rq to the list
when it's under a throttled ancestor (namely 4/5). The runnable
condition is rather orthogonal. (Not sure it's the patchset you were
referring to.)
> Wouldn't this fix the problem ?
FWIW, the "reliable" fix so far is a revert of the commit a7b359fc6a37
("sched/fair: Correctly insert cfs_rq's to list on
unthrottle"). Therefore my hypothesis about racy adding from
tg_unthrottle_up(), so I think the other patches won't affect the issue.
Thanks for your feedback. Let me examine the problem some more before
continuing with this patch.
Michal
[1] https://lore.kernel.org/all/20210819175034.4577-1-mkoutny@xxxxxxxx/