Re: [PATCH v3] sched: async unthrottling for cfs bandwidth

From: Peter Zijlstra
Date: Fri Nov 25 2022 - 04:13:10 EST


On Fri, Nov 25, 2022 at 09:59:23AM +0100, Peter Zijlstra wrote:
> On Fri, Nov 25, 2022 at 09:57:09AM +0100, Peter Zijlstra wrote:
> > On Tue, Nov 22, 2022 at 11:35:48AM +0100, Peter Zijlstra wrote:
> > > On Mon, Nov 21, 2022 at 11:37:14AM -0800, Josh Don wrote:
> > > > Yep, this tradeoff feels "best", but there are some edge cases where
> > > > this could potentially disrupt fairness. For example, if we have
> > > > non-trivial W, a lot of cpus to iterate through for dispatching remote
> > > > unthrottle, and quota is small. Doesn't help that the timer is pinned
> > > > so that this will continually hit the same cpu.
> > >
> > > We could -- if we wanted to -- manually rotate the timer around the
> > > relevant CPUs. Doing that sanely would require a bit of hrtimer surgery
> > > though I'm afraid.
> >
> > Here; something like so should enable us to cycle the bandwidth timer.
> > Just need to figure out a way to find another CPU or something.
>
> Some more preparation...

And then I think something like so.. That migrates the timer to the CPU
of the first throttled entry -- possibly not the best heuristic, but its
the simplest.

NOTE: none of this has seen a compiler up close.

---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5595,13 +5595,21 @@ static bool distribute_cfs_runtime(struc
*/
static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun, unsigned long flags)
{
- int throttled;
+ struct cfs_rq *first_cfs_rq;
+ int throttled = 0;
+ int cpu;

/* no need to continue the timer with no bandwidth constraint */
if (cfs_b->quota == RUNTIME_INF)
goto out_deactivate;

- throttled = !list_empty(&cfs_b->throttled_cfs_rq);
+ first_cfs_rq = list_first_entry_or_null(&cfs_b->throttled_cfs_rq,
+ struct cfs_rq, throttled_list);
+ if (first_cfs_rq) {
+ throttled = 1;
+ cpu = cpu_of(rq_of(first_cfs_rq));
+ }
+
cfs_b->nr_periods += overrun;

/* Refill extra burst quota even if cfs_b->idle */
@@ -5641,7 +5649,7 @@ static int do_sched_cfs_period_timer(str
*/
cfs_b->idle = 0;

- return HRTIMER_RESTART;
+ return HRTIMER_RESTART_MIGRATE + cpu;

out_deactivate:
return HRTIMER_NORESTART;