Re: [PATCH v2] sched/fair: sanitize vruntime of entity being migrated

From: Vincent Guittot
Date: Wed Mar 15 2023 - 06:22:29 EST


On Wed, 15 Mar 2023 at 11:15, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>
> On 15/03/2023 09:42, Vincent Guittot wrote:
> > On Wed, 15 Mar 2023 at 08:18, Vincent Guittot
> > <vincent.guittot@xxxxxxxxxx> wrote:
> >>
> >> On Tue, 14 Mar 2023 at 18:16, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >>>
> >>> On Tue, Mar 14, 2023 at 02:24:37PM +0100, Vincent Guittot wrote:
> >>>
> >>>>> @@ -7632,11 +7646,8 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
> >>>>> * min_vruntime -- the latter is done by enqueue_entity() when placing
> >>>>> * the task on the new runqueue.
> >>>>> */
> >>>>> - if (READ_ONCE(p->__state) == TASK_WAKING) {
> >>>>> - struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >>>>> -
> >>>>> + if (READ_ONCE(p->__state) == TASK_WAKING || reset_vruntime(cfs_rq, se))
> >>>>
> >>>> That's somehow what was proposed in one of the previous proposals but
> >>>> we can't call rq_clock_task(rq_of(cfs_rq)) because rq lock might not
> >>>> be hold and rq task clock has not been updated before being used
> >>>
> >>> Argh indeed. I spend a lot of time ensuring we didn't take the old rq
> >>> lock on wakeup -- and then a lot of time cursing about how we don't :-)
> >>>
> >>> Now, if we could rely on the rq-clock being no more than 1 tick behind
> >>> current, this would still be entirely sufficient to catch the long sleep
> >>> case.
> >>
> >> We should also take care when loading rq_clock_task that we are not
> >> racing with an update especially for a 32bits system like pelt
> >> last_update_time
> >
> > We still have this possibility:
> > https://lore.kernel.org/lkml/ZAiFxWLSb9HDazSI@vingu-book/
> >
> > which uses pelt last_update_time when migrating and keep using
> > rq_clock_task in place_entity
>
> Isn't there an issue with this approach on asymmetric CPU capacity systems?
>
> We do a sync_entity_load_avg() in select_task_rq_fair()
> (find_energy_efficient_cpu() for EAS and select_idle_sibling() for CAS)
> to sync cfs_rq and se.

ah yes, I forgot this point.
That being said, is it a valid problem for EAS based system ? I mean
we are trying to fix a vruntime comparison overflow that can happen
with a very long sleeping task (around 200 days) while only a very low
weight entity is always running during those 200 days.

>
> [...]
>