Re: [PATCH] RFC: sched: Rework task_sched_runtime to avoid calling update_rq_clock

From: Qais Yousef
Date: Tue Jun 18 2024 - 11:25:18 EST


On 06/18/24 09:04, Peter Zijlstra wrote:
> On Sun, Jun 16, 2024 at 11:36:16PM +0100, Qais Yousef wrote:
>
> > > Which then gets me something like the (completely untested) below..
> > >
> > > Hmm?
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 0935f9d4bb7b..36aed99d6a6c 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -724,7 +724,6 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
> > >
> > > rq->prev_irq_time += irq_delta;
> > > delta -= irq_delta;
> > > - psi_account_irqtime(rq->curr, irq_delta);
> > > delayacct_irq(rq->curr, irq_delta);
> > > #endif
> > > #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> > > @@ -5459,6 +5458,8 @@ void sched_tick(void)
> > >
> > > sched_clock_tick();
> > >
> > > + psi_account_irqtime(curr, NULL, &rq->psi_irq_time);
> > > +
> > > rq_lock(rq, &rf);
> > >
> > > update_rq_clock(rq);
> > > @@ -6521,6 +6524,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
> > > ++*switch_count;
> > >
> > > migrate_disable_switch(rq, prev);
> > > + psi_account_irqtime(prev, next, &rq->psi_irq_time);
> >
> > Hmm are prev and next swapped here? next == curr in my view if there's no
> > subtly I missed
>
> This is before context_switch() so prev == current at this point.
> However, more importantly, the PSI thing accounts to its 'curr' group
> and that should very much be the outgoing task's group in this case.
>
> That is, we need to make sure the outgoing group is up-to-date before
> switching to a new group.
>
> Makes sense?

Yes, thanks! After reviewing the patch though I wondered if we care to handle
when the group changes while we're RUNNING. There will be no __schedule() then.
But it's a matter of how accurate this is supposed to be. In sched_move_task()
we dequeue with DEQUEUE_NOCLOCK which IIUC would have not caused
psi_account_irqtime() to be called on the move already. So we're doing the
same.

We can still maybe move this to sched_change_group() (keeping the psi group
change check you added)? This still LGTM anyway, but thought I'll spill it out.

>
> > > psi_sched_switch(prev, next, !task_on_rq_queued(prev));
> > >
> > > trace_sched_switch(sched_mode & SM_MASK_PREEMPT, prev, next, prev_state);
> > > diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> > > index 146baa91d104..65bba162408f 100644
> > > --- a/kernel/sched/psi.c
> > > +++ b/kernel/sched/psi.c
> > > @@ -991,22 +991,31 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next,
> > > }
> > >
> > > #ifdef CONFIG_IRQ_TIME_ACCOUNTING
> > > -void psi_account_irqtime(struct task_struct *task, u32 delta)
> > > +void psi_account_irqtime(struct task_struct *curr, struct task_struct *prev, u64 *time)
> > > {
> > > - int cpu = task_cpu(task);
> > > + int cpu = task_cpu(curr);
> > > struct psi_group *group;
> > > struct psi_group_cpu *groupc;
> > > - u64 now;
> > > + u64 now, irq;
> > > + s64 delta;
> > >
> > > if (static_branch_likely(&psi_disabled))
> > > return;
> > >
> > > - if (!task->pid)
> > > + if (!curr->pid)
> > > + return;
> > > +
> > > + group = task_psi_group(curr);
> > > + if( prev && task_psi_group(prev) == group)
> >
> > nit: whitespace misplaced
>
> Ha!, is that's all and it all works in one go it's awesome :-)
>
> I'm still trying to learn to type again after switching keyboards. I've
> used a thinkpad keyboard (either on an actual laptop or the travel
> version on my desktop for nearly 20 years... Now I've picked up a split
> keyboard out of necessity (UHK 60 v2 for those interested) and muscle
> memory is still cursing me every single day.

I failed to use those split keyboards..

>
> As a result, I now also cannot type on my laptop anymore, so lose-lose I
> suppose ... urgh.

You mean you're not using AI yet?! :-O

>
> > LGTM otherwise.
> >
> > Reviewed-by: Qais Yousef <qyousef@xxxxxxxxxxx>
>
> Thanks!
>
> John, can you write up a changelog with some pretty numbers and all
> that? Also, when you re-post, can you make sure to Cc the PSI folks
> (johannes and suren iirc, get_maintainers.pl seems to find them).