Re: [PATCH v8 -tip 08/26] sched/fair: Snapshot the min_vruntime of CPUs on force idle

From: Joel Fernandes
Date: Thu Oct 29 2020 - 22:42:44 EST


On Thu, Oct 29, 2020 at 10:36 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:

> > > > +bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
> > > > +{
> > > > + struct rq *rq = task_rq(a);
> > > > + struct sched_entity *sea = &a->se;
> > > > + struct sched_entity *seb = &b->se;
> > > > + struct cfs_rq *cfs_rqa;
> > > > + struct cfs_rq *cfs_rqb;
> > > > + s64 delta;
> > > > +
> > > > + SCHED_WARN_ON(task_rq(b)->core != rq->core);
> > > > +
> > > > + while (sea->cfs_rq->tg != seb->cfs_rq->tg) {
> > > > + int sea_depth = sea->depth;
> > > > + int seb_depth = seb->depth;
> > > > +
> > > > + if (sea_depth >= seb_depth)
> > > > + sea = parent_entity(sea);
> > > > + if (sea_depth <= seb_depth)
> > > > + seb = parent_entity(seb);
> > > > + }
> > > > +
> > > > + if (rq->core->core_forceidle) {
> > > > + se_fi_update(sea, rq->core->core_forceidle_seq, true);
> > > > + se_fi_update(seb, rq->core->core_forceidle_seq, true);
> > > > + }
> > >
> > > As we chatted on IRC you mentioned the reason for the sync here is:
> > >
> > > say we have 2 cgroups (a,b) under root, and we go force-idle in a, then we
> > > update a and root. Then we pick and end up in b, but b hasn't been updated
> > > yet.
> > >
> > > One thing I was wondering about that was, if the pick of 'b' happens much
> > > later than 'a', then the snapshot might be happening too late right?
> >
> > No, since this is the first pick in b since fi, it cannot have advanced.
> > So by updating to fi_seq before picking, we guarantee it is unchanged
> > since we went fi.
>
> Makes complete sense.
>
> I got it to a point where the latencies are much lower, but still not
> at a point where it's as good as the initial patch I posted.
>
> There could be more bugs. At the moment, the only one I corrected in
> your patch is making the truth table do !(!fib && fi). But there is
> still something else going on.

Forgot to ask, do you also need to do the task_vruntime_update() for
the unconstrained pick?

That's in line with what you mentioned: That you still need to do the
update if fi_before == false and fi_now == false.

So something like this?
@@ -4209,6 +4209,10 @@ pick_next_task(struct rq *rq, struct
task_struct *prev, struct rq_flags *rf)
next = p;
trace_printk("unconstrained pick: %s/%d %lx\n",
next->comm, next->pid,
next->core_cookie);
+
+ WARN_ON_ONCE(fi_before);
+ task_vruntime_update(rq_i, p);
+
goto done;
}

Quoting the truth table:

> > fib fi X
> >
> > 0 0 1
> > 0 1 0
> > 1 0 1
> > 1 1 1
> >

thanks,

- Joel