Re: [PATCH v6 01/20] sched: Unify runtime accounting across classes

From: John Stultz
Date: Mon Dec 18 2023 - 15:23:45 EST


On Sun, Dec 17, 2023 at 8:19 AM Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> On 11/06/23 19:34, John Stultz wrote:
> > From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> >
> > All classes use sched_entity::exec_start to track runtime and have
> > copies of the exact same code around to compute runtime.
> >
> > Collapse all that.
> >
...
> Looks like this actually got merged into tip via the deadline server work :-)

Oh! That's great to see! The patch has been floating around for a while.

> Though not sure if I caught a bug here
>
> > diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
> > index 85590599b4d6..7595494ceb6d 100644
> > --- a/kernel/sched/stop_task.c
> > +++ b/kernel/sched/stop_task.c
> > @@ -70,18 +70,7 @@ static void yield_task_stop(struct rq *rq)
> >
> > static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
> > {
> > - struct task_struct *curr = rq->curr;
> > - u64 now, delta_exec;
> > -
> > - now = rq_clock_task(rq);
> > - delta_exec = now - curr->se.exec_start;
> > - if (unlikely((s64)delta_exec < 0))
> > - delta_exec = 0;
>
> If negative instead of returning for stopper task; we set delta_exec to 0
>
> > -
> > - schedstat_set(curr->stats.exec_max,
> > - max(curr->stats.exec_max, delta_exec));
> > -
> > - update_current_exec_runtime(curr, now, delta_exec);
>
> And curry on to do time accounting
>
> > + update_curr_common(rq);
>
> But the new function will return early without doing accounting. Wouldn't this
> re-introrduce 8f6189684eb4 ("sched: Fix migration thread runtime bogosity")?

Hrm. So first, good eye for catching this!
Looking through the code, much of the accounting logic we end up
skipping doesn't have much effect when delta_exec = 0, so it seems
mostly harmless to return early without the accounting.

Though, there is one side-effect that does get skipped, which is the
removed update_current_exec_runtime() unconditionally sets:
curr->se.exec_start = now;

Which basically resets the accounting window.

>From the commit, It's unclear how intentional this side-effect is for
the edge case where the interval is negative.

I can't say I've really wrapped my head around the cases where the
se.exec_start would get ahead of the rq_clock_task(), so it's not
clear in which cases we would want to reset the accounting window vs
wait for the rq_clock_task() to catch up. But as this is getting
called from put_prev_task_stop(), it seems we're closing the
accounting window here anyway, and later set_next_task_stop() would be
called (which sets se.exec_start, resetting the accounting) to start
the accounting window again.

So you are right that there is a practical change in behavior, but I
don't think I see it having an effect.

But I've added Mike and Daniel to the CC in case I'm missing something.

thanks
-john