Re: [PATCH] sched/core: forced idle accounting

From: Hao Luo
Date: Thu Oct 14 2021 - 19:58:38 EST

Next message: Sean Christopherson: "Re: [PATCH v8 15/15] KVM: x86/cpuid: Advise Arch LBR feature in CPUID"
Previous message: Ming Lei: "Re: [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate"
In reply to: Josh Don: "Re: [PATCH] sched/core: forced idle accounting"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Oct 14, 2021 at 4:29 PM Josh Don <joshdon@xxxxxxxxxx> wrote:
>
> On Thu, Oct 14, 2021 at 10:58 AM Hao Luo <haoluo@xxxxxxxxxx> wrote:
> >
> > On Mon, Oct 11, 2021 at 5:31 PM Josh Don <joshdon@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Oct 11, 2021 at 10:33 AM Hao Luo <haoluo@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Oct 7, 2021 at 5:08 PM Josh Don <joshdon@xxxxxxxxxx> wrote:
> > > > > -void sched_core_dequeue(struct rq *rq, struct task_struct *p)
> > > > > +void sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags)
> > > > > {
> > > > > rq->core->core_task_seq++;
> > > > >
> > > > > - if (!sched_core_enqueued(p))
> > > > > - return;
> > > > > + if (sched_core_enqueued(p)) {
> > > > > + rb_erase(&p->core_node, &rq->core_tree);
> > > > > + RB_CLEAR_NODE(&p->core_node);
> > > > > + }
> > > > >
> > > > > - rb_erase(&p->core_node, &rq->core_tree);
> > > > > - RB_CLEAR_NODE(&p->core_node);
> > > > > + /*
> > > > > + * Migrating the last task off the cpu, with the cpu in forced idle
> > > > > + * state. Reschedule to create an accounting edge for forced idle,
> > > > > + * and re-examine whether the core is still in forced idle state.
> > > > > + */
> > > > > + if (!(flags & DEQUEUE_SAVE) && rq->nr_running == 1 &&
> > > > > + rq->core->core_forceidle && rq->curr == rq->idle)
> > > > > + resched_curr(rq);
> > > >
> > > > Resched_curr is probably an unwanted side effect of dequeue. Maybe we
> > > > could extract the check and resched_curr out into a function, and call
> > > > the function outside of sched_core_dequeue(). In that way, the
> > > > interface of dequeue doesn't need to change.
> > >
> > > This resched is an atypical case; normal load balancing won't steal
> > > the last runnable task off a cpu. The main reasons this resched could
> > > trigger are: migration due to affinity change, and migration due to
> > > sched core doing a cookie_steal. Could bubble this up to
> > > deactivate_task(), but seems less brittle to keep this in dequeue()
> > > with the check against DEQUEUE_SAVE (since this creates an important
> > > accounting edge). Thoughts?
> > >
> >
> > I prefer bubbling it up to deactivate_task(). Depending on how many
> > callers of deactivate_task() need this resched, IMHO it is even fine
> > to put it in deactivate_task's caller. Wrapping it in a function may
> > help clarify its purpose.
>
> I'd argue against bubbling up above deactivate_task(); makes things
> much more brittle if a new use for deactivate_task() is added in the
> future.
>
> Tried both ways; IMO it seems slightly better to leave in dequeue() vs
> deactivate(); less confusing to one hook instead of two for coresched
> to handle dequeuing a task.
>

Ack. No problem. I don't have strong objections here.

> > > > > /*
> > > > > @@ -5765,7 +5782,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> > > > > for_each_cpu_wrap(i, smt_mask, cpu) {
> > > > > rq_i = cpu_rq(i);
> > > > >
> > > > > - if (i != cpu)
> > > > > + if (i != cpu && (rq_i != rq->core || !core_clock_updated))
> > > > > update_rq_clock(rq_i);
> > > >
> > > > Do you mean (rq_i != rq->core && !core_clock_updated)? I thought
> > > > rq->core has core_clock updated always.
> > >
> > > rq->clock is updated on entry to pick_next_task(). rq->core is only
> > > updated if rq == rq->core, or if we've done the clock update for
> > > rq->core above.
> >
> > I meant 'if (i != cpu && rq_i != rq->core)'. Because at this point,
> > core_clock should already have been updated, is that not the case?
> > Anyway, the tracking of clock updates here is too confusing to me.
>
> Added a comment here, but the logic flow is:
> - rq->clock is always updated on entry to pick_next_task()
> - rq->core->clock _may_ be updated by the time we get to this part of
> pick_next_task(). We have to be careful to avoid a double update,
> hence the need for the core_clock_updated var.

Yeah. Sync'ed offline and that cleared my confusion. Thanks.

Hao

Next message: Sean Christopherson: "Re: [PATCH v8 15/15] KVM: x86/cpuid: Advise Arch LBR feature in CPUID"
Previous message: Ming Lei: "Re: [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate"
In reply to: Josh Don: "Re: [PATCH] sched/core: forced idle accounting"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]