Re: [PATCH v4 7/7] sched/fair: de-entropy for SIS filter

From: Gautham R. Shenoy
Date: Wed Jul 20 2022 - 13:09:17 EST


Hello Abel,

On Sun, Jun 19, 2022 at 08:04:51PM +0800, Abel Wu wrote:
> Now when updating core state, there are two main problems that could
> pollute the SIS filter:
>
> - The updating is before task migration, so if dst_cpu is
> selected to be propagated which might be fed with tasks
> soon, the efforts we paid is no more than setting a busy
> cpu into the SIS filter. While on the other hand it is
> important that we update as early as possible to keep the
> filter fresh, so it's not wise to delay the update to the
> end of load balancing.
>
> - False negative propagation hurts performance since some
> idle cpus could be out of reach. So in general we will
> aggressively propagate idle cpus but allow false positive
> continue to exist for a while, which may lead to filter
> being fully polluted.

Ok, so the false positive case is being addressed in this patch.

>
> Pains can be relieved by a force correction when false positive
> continuously detected.
>
[..snip..]

> @@ -111,6 +117,7 @@ struct sched_group;
> enum sd_state {
> sd_has_icores,
> sd_has_icpus,
> + sd_may_idle,
> sd_is_busy
> };
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d55fdcedf2c0..9713d183d35e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c

[...snip..]

> @@ -9321,7 +9327,7 @@ static void sd_update_state(struct lb_env *env, struct sd_lb_stats *sds)
> {
> struct sched_domain_shared *sd_smt_shared = env->sd->shared;
> enum sd_state new = sds->sd_state;
> - int this = env->dst_cpu;
> + int icpu = sds->idle_cpu, this = env->dst_cpu;
>
> /*
> * Parallel updating can hardly contribute accuracy to
> @@ -9331,6 +9337,22 @@ static void sd_update_state(struct lb_env *env, struct sd_lb_stats *sds)
> if (cmpxchg(&sd_smt_shared->updating, 0, 1))
> return;
>
> + /*
> + * The dst_cpu is likely to be fed with tasks soon.
> + * If it is the only unoccupied cpu in this domain,
> + * we still handle it the same way as as_has_icpus
^^^^^^^^^^^^^
Nit: sd_has_icpus

> + * but turn the SMT into the unstable state, rather
> + * than waiting to the end of load balancing since
> + * it's also important that update the filter as
> + * early as possible to keep it fresh.
> + */
> + if (new == sd_is_busy) {
> + if (idle_cpu(this) || sched_idle_cpu(this)) {
> + new = sd_may_idle;
> + icpu = this;
> + }
> + }
> +
> /*
> * There is at least one unoccupied cpu available, so
> * propagate it to the filter to avoid false negative
> @@ -9338,6 +9360,12 @@ static void sd_update_state(struct lb_env *env, struct sd_lb_stats *sds)
> * idle cpus thus throughupt downgraded.
> */
> if (new != sd_is_busy) {
> + /*
> + * The sd_may_idle state is taken into
> + * consideration as well because from
> + * here we couldn't actually know task
> + * migrations would happen or not.
> + */
> if (!test_idle_cpus(this))
> set_idle_cpus(this, true);
> } else {
> @@ -9347,9 +9375,26 @@ static void sd_update_state(struct lb_env *env, struct sd_lb_stats *sds)
> */
> if (sd_smt_shared->state == sd_is_busy)
> goto out;
> +
> + /*
> + * Allow false positive to exist for some time
> + * to make a tradeoff of accuracy of the filter
> + * for relieving cache traffic.
> + */

I can understand allowing the false positive to exist when there are
no other idle CPUs in this SMT domain other than this CPU, which is
handled by the case where new != sd_is_busy in the current
load-balance round and will be handled by the "else" clause in the
subsequent round if env->dst_cpu ends up becoming busy.


However, when we know that new == sd_is_busy and the previous state of
this SMT domain was sd_has_icpus, should we not immediately clear this
core's cpumask from the LLCs icpus mask? What is the need for the
intermediate sd_may_idle state transition between sd_has_icpus and
sd_is_busy in this case ?



> + if (sd_smt_shared->state == sd_has_icpus) {
> + new = sd_may_idle;
> + goto update;
> + }
> +
> + /*
> + * If the false positive issue has already been
> + * there for a while, a correction of the filter
> + * is needed.
> + */
> }
>
> sd_update_icpus(this, sds->idle_cpu);
^^^^^^^^^^^^^^

I think you meant to use icpu here ? sds->idle_cpu == -1 in the case
when new == sd_may_idle. Which will end up clearing this core's
cpumask from this LLC's icpus mask. This defeats the
"allow-false-positive" heuristic.




> +update:
> sd_smt_shared->state = new;
> out:
> xchg(&sd_smt_shared->updating, 0);
> --
> 2.31.1
>

--
Thanks and Regards
gautham.