Re: [PATCH] Revert "sched/fair: Fix O(nr_cgroups) in the load balancing path"

From: Peter Zijlstra
Date: Tue Oct 29 2019 - 11:36:47 EST

On Tue, Oct 29, 2019 at 07:55:26AM -0700, Doug Smythies wrote:

> I only know that the call to the intel_pstate driver doesn't
> happen, and that it is because cfs_rq_is_decayed returns TRUE.
> So, I am asserting that the request is not actually decayed, and
> should not have been deleted.

So what cfs_rq_is_decayed() does is allow a cgroup's cfs_rq to be
removed from the list.

Once it is removed, that cfs_rq will no longer be checked in the
update_blocked_averages() loop. Which means done has less chance of
getting false. Which in turn means that it's more likely
rq->has_blocked_load becomes 0.

Which all sounds good.

Can you please trace what keeps the CPU awake?

> Now, if we also look back at the comments for the original commit:
> "In an edge case where temporary cgroups were leaking, this
> caused the kernel to consume good several tens of percents of
> CPU cycles running update_blocked_averages(), each run taking
> multiple millisecs."
> To my way of thinking: Fix the leak, don't program around it; The
> commit breaks something else, so revert it.

The leak was fixed, but it still doesn't make sense to keep idle cgroups
on that list. Some people have a stupid amount of cgroups, most of which
are pointless and unused, so being able to remove them is good.

Which is why it got added back, once list management issues were sorted.