Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
From: Valentin Schneider
Date: Wed Apr 21 2021 - 06:52:45 EST
On 20/04/21 16:33, Vincent Guittot wrote:
> On Mon, 19 Apr 2021 at 19:13, Valentin Schneider
> <valentin.schneider@xxxxxxx> wrote:
>>
>> On 16/04/21 15:51, Vincent Guittot wrote:
>> > Le jeudi 15 avril 2021 � 18:58:46 (+0100), Valentin Schneider a �crit :
>> >> +
>> >> +/*
>> >> + * What does migrating this task do to our capacity-aware scheduling criterion?
>> >> + *
>> >> + * Returns 1, if the task needs more capacity than the dst CPU can provide.
>> >> + * Returns 0, if the task needs the extra capacity provided by the dst CPU
>> >> + * Returns -1, if the task isn't impacted by the migration wrt capacity.
>> >> + */
>> >> +static int migrate_degrades_capacity(struct task_struct *p, struct lb_env *env)
>> >> +{
>> >> + if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
>> >> + return -1;
>> >> +
>> >> + if (!task_fits_capacity(p, capacity_of(env->src_cpu))) {
>> >> + if (cpu_capacity_greater(env->dst_cpu, env->src_cpu))
>> >> + return 0;
>> >> + else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu))
>> >> + return 1;
>> >> + else
>> >> + return -1;
>> >> + }
>> >
>> > Being there means that task fits src_cpu capacity so why testing p against dst_cpu ?
>> >
>>
>> Because if p fits on src_cpu, we don't want to move it to a dst_cpu on
>> which it *doesn't* fit.
>
> OK. I was confused because I thought that this was only to force
> migration in case of group_misfit_task but you tried to extend to
> other cases... I'm not convinced that you succeeded to cover all cases
>
> Also I found this function which returns 3 values a bit disturbing.
> IIUC you tried to align to migrate_degrades_capacity but you should
> have better aligned to task_hot and return only 0 or 1. -1 is not used
>
Ack, will do.
>> >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>> >> if (tsk_cache_hot == -1)
>> >> tsk_cache_hot = task_hot(p, env);
>> >>
>> >> + /*
>> >> + * On a (sane) asymmetric CPU capacity system, the increase in compute
>> >> + * capacity should offset any potential performance hit caused by a
>> >> + * migration.
>> >> + */
>> >> + if ((env->dst_grp_type == group_has_spare) &&
>> >
>> > Shouldn't it be env->src_grp_type == group_misfit_task to only care of misfit task case as
>> > stated in $subject
>> >
>>
>> Previously this was env->idle != CPU_NOT_IDLE, but I figured dst_grp_type
>> could give us a better picture. Staring at this some more, this isn't so
>> true when the group size goes up - there's no guarantees the dst_cpu is the
>> one that has spare cycles, and the other CPUs might not be able to grant
>> the capacity uplift dst_cpu can.
>
> yeah you have to keep checking for env->idle != CPU_NOT_IDLE
>
>>
>> As for not using src_grp_type == group_misfit_task, this is pretty much the
>> same as [1]. CPU-bound (misfit) task + some other task on the same rq
>> implies group_overloaded classification when balancing at MC level (no SMT,
>> so one group per CPU).
>
> Is it something that happens often or just a sporadic/transient state
> ? I mean does it really worth the extra complexity and do you see
> performance improvement ?
>
"Unfortunately" yes, this is a relatively common scenario when running "1
big task per CPU" types of workloads. The expected behaviour for big.LITTLE
systems is to upmigrate tasks stuck on the LITTLE CPUs as soon as a big CPU
becomes free, usually via newidle balance (which, since they process work
faster than the LITTLEs, is bound to happen), and an extra task being
enqueued at "the wrong time" can prevent this from happening.
This usually means a misfit task can take a few dozen extra ms than it
should to be migrated - in the tests I run (which are pretty much this 1
hog per CPU workload) this happens about ~20% of the time.
> You should better focus on fixing the simple case of group_misfit_task
> task. This other cases looks far more complex with lot of corner cases
>
>>
>> [1]: http://lore.kernel.org/r/jhjblcuv2mo.mognet@xxxxxxx