Re: [PATCH V2 1/2] sched: Reduce the default slice to avoid tasks getting an extra tick

From: Vincent Guittot
Date: Fri Feb 07 2025 - 08:46:36 EST


On Fri, 7 Feb 2025 at 07:15, zihan zhou <15645113830zzh@xxxxxxxxx> wrote:
>
> Reduce the default slice, add a comment explaining why this modification
> was made.
>
> Signed-off-by: zihan zhou <15645113830zzh@xxxxxxxxx>
> ---
> kernel/sched/fair.c | 47 +++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 43 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 26958431deb7..754b0785eaa0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -71,10 +71,49 @@ unsigned int sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_LOG;
> /*
> * Minimal preemption granularity for CPU-bound tasks:
> *
> - * (default: 0.75 msec * (1 + ilog(ncpus)), units: nanoseconds)
> - */
> -unsigned int sysctl_sched_base_slice = 750000ULL;
> -static unsigned int normalized_sysctl_sched_base_slice = 750000ULL;
> + * (default: 0.70 msec * (1 + ilog(ncpus)), units: nanoseconds)
> + *
> + * The old default value for slice is 0.75 msec * (1 + ilog(ncpus)) which
> + * means that we have a default slice of
> + * 0.75 for 1 cpu
> + * 1.50 up to 3 cpus
> + * 2.25 up to 7 cpus
> + * 3.00 for 8 cpus and above.
> + *
> + * For HZ=250 and HZ=100, because of the tick accuracy, the runtime of tasks
> + * is far higher than their slice.
> + * For HZ=1000 with 8 cpus or more, the accuracy of tick is already
> + * satisfactory, but there is still an issue that tasks will get an extra
> + * tick because the tick often arrives a little faster than expected. In this
> + * case, the task can only wait until the next tick to consider that it has
> + * reached its deadline, and will run 1ms longer.
> + *
> + * vruntime + sysctl_sched_base_slice = deadline
> + * |-----------|-----------|-----------|-----------|
> + * 1ms 1ms 1ms 1ms
> + * ^ ^ ^ ^
> + * tick1 tick2 tick3 tick4(nearly 4ms)
> + *
> + * There are two reasons for tick error: clockevent precision and the
> + * CONFIG_IRQ_TIME_ACCOUNTING/CONFIG_PARAVIRT_TIME_ACCOUNTING.
> + * with CONFIG_IRQ_TIME_ACCOUNTING every tick will be less than 1ms, but even
> + * without it, because of clockevent precision, tick still often less than
> + * 1ms.
> + *
> + * In order to make scheduling more precise, we changed 0.75 to 0.70,
> + * Using 0.70 instead of 0.75 should not change much for other configs
> + * and would fix this issue:
> + * 0.70 for 1 cpu
> + * 1.40 up to 3 cpus
> + * 2.10 up to 7 cpus
> + * 2.8 for 8 cpus and above.
> + *
> + * This does not guarantee that tasks can run the slice time accurately every
> + * time, but occasionally running an extra tick has little impact.

The explanation above about why you change sysctl_sched_base_slice
from 0.75 to 0.70 should be put in commit message instead of the code

With this change:

Reviewed-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>

> + *
> + */
> +unsigned int sysctl_sched_base_slice = 700000ULL;
> +static unsigned int normalized_sysctl_sched_base_slice = 700000ULL;
>
> const_debug unsigned int sysctl_sched_migration_cost = 500000UL;
>
> --
> 2.33.0
>