Re: [PATCH 2/2] sched/fair: Scale wakeup granularity relative to nr_running
From: Mike Galbraith
Date: Mon Sep 20 2021 - 23:53:10 EST
On Mon, 2021-09-20 at 15:26 +0100, Mel Gorman wrote:
>
> This patch scales the wakeup granularity based on the number of running
> tasks on the CPU up to a max of 8ms by default. The intent is to
> allow tasks to run for longer while overloaded so that some tasks may
> complete faster and reduce the degree a domain is overloaded. Note that
> the TuneD throughput-performance profile allows up to 15ms but there
> is no explanation why such a long period was necessary so this patch is
> conservative and uses the value that check_preempt_wakeup() also takes
> into account. An internet search for instances where this parameter are
> tuned to high values offer either no explanation or a broken one.
>
> This improved hackbench on a range of machines when communicating via
> pipes (sockets show little to no difference). For a 2-socket CascadeLake
> machine, the results were
Twiddling wakeup preemption based upon the performance of a fugly fork
bomb seems like a pretty bad idea to me.
Preemption does rapidly run into diminishing return as load climbs for
a lot of loads, but as you know, it's a rather sticky wicket because
even when over-committed, preventing light control threads from slicing
through (what can be a load's own work crew of) hogs can seriously
injure performance.
<snip>
> @@ -7044,10 +7045,22 @@ balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> }
> #endif /* CONFIG_SMP */
>
> -static unsigned long wakeup_gran(struct sched_entity *se)
> +static unsigned long
> +wakeup_gran(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> unsigned long gran = sysctl_sched_wakeup_granularity;
>
> + /*
> + * If rq is specified, scale the granularity relative to the number
> + * of running tasks but no more than 8ms with default
> + * sysctl_sched_wakeup_granularity settings. The wakeup gran
> + * reduces over-scheduling but if tasks are stacked then the
> + * domain is likely overloaded and over-scheduling may
> + * prolong the overloaded state.
> + */
> + if (cfs_rq && cfs_rq->nr_running > 1)
> + gran *= min(cfs_rq->nr_running >> 1, sched_nr_latency);
> +
Maybe things have changed while I wasn't watching closely, but...
The scaled up tweakables on my little quad desktop box:
sched_nr_latency = 8
sched_wakeup_granularity = 4ms
sched_latency = 24ms
Due to the FAIR_SLEEPERS feature, a task can only receive a max of
sched_latency/2 sleep credit, ie the delta between waking sleeper and
current is clipped to a max of 12 virtual ms, so the instant our
preempt threshold reaches 12.000ms, by human booboo or now 3 runnable
tasks with this change, wakeup preemption is completely disabled, or?
-Mike