Re: [RFC PATCH 00/22] sched/fair: Defer CFS throttling to exit to user mode

From: Peter Zijlstra
Date: Thu Feb 20 2025 - 06:34:31 EST


On Thu, Feb 20, 2025 at 04:48:58PM +0530, K Prateek Nayak wrote:

> The rationale there was with growing number of tasks on cfs_rq, the
> throttle path has to perform a lot of dequeues and the unthrottle at
> distribution has to enqueue all the dequeued threads back.
>
> This is one way to keep all the tasks queued but allow pick to only
> select among those that are preempted in kernel mode.
>
> Since per-task throttling needs to tag, dequeue, and re-enqueue each
> task, I'm putting this out as an alternate approach that does not
> increase the complexities of tg_tree walks which Ben had noted on
> Valentin's series [1]. Instead we retain the per cfs_rq throttling
> at the cost of some stats tracking at enqueue and dequeue
> boundaries.
>
> If you have a strong feelings against any specific part, or the entirety
> of this approach, please do let me know, and I'll do my best to see if
> a tweaked approach or an alternate implementation can scale well with
> growing thread counts (or at least try to defend the bits in question if
> they hold merit still).
>
> Any and all feedback is appreciated :)

Pfff.. I hate it all :-)

So the dequeue approach puts the pain on the people actually using the
bandwidth crud, while this 'some extra accounting' crap has *everybody*
pay for this nonsense, right?

I'm not sure how bad this extra accounting is, but I do fear death by a
thousand cuts.