Re: [PATCH v3] sched/cputime: let ktimers align with ksoftirqd in accounting CPUTIME_SOFTIRQ

From: Steven Rostedt
Date: Thu Dec 07 2023 - 13:17:46 EST


On Thu, 7 Dec 2023 12:19:28 -0500
Yuanhan Zhang <zyhtheonly@xxxxxxxxx> wrote:

> Hi,
>
> Steven Rostedt <rostedt@xxxxxxxxxxx> 于2023年12月7日周四 10:35写道:
> >
> > On Thu, 7 Dec 2023 18:43:47 +0800
> > Yuanhan Zhang <zyhtheonly@xxxxxxxxx> wrote:
> >
> > > It results in if we do not handle ksoftirqd like this, we will have a
> > > bigger SYSTEM and less SOFTIRQ.
> >
> > And honestly that's what we want. Interrupts and softirqs that execute in
> > interrupts and softirq context take away from the system. That is, if they
> > are not explicitly blocked (local_irq_disable/local_bh_disable) they
> > interrupt the current task and take up the time of the current task. We
> > need to differentiate this because this context has no "task" context to
> > measure.
> >
> > We do not want to add ksoftirqd or threaded interrupt handlers / softirqs
> > to this measurement. Sure, they are handling interrupt and softirq code,
> > but they have their own context that can be measured like any other task.
> >
> > If we blur this with real irqs and softirqs, then we will not know what
> > those real irqs and softirqs are measuring.
>
> Yes you say clearly enough and it makes some sense to me!
> So my understanding is that in PREEMPT_RT, it is better to put ksoftirqd's time
> into SYSTEM since they are just in their "task" context.
>
> If my understanding is right, how about we just exclude ksoftirqd
> like what I do in the last email?
> (something like
> else if (in_serving_softirq() &&
> (IS_ENABLED(CONFIG_PREEMPT_RT) || curr != this_cpu_ksoftirqd()))
>
> If this looks okay to you, I'm happy to send a new patch for this :)

Hmm, looking at the code in mainline, I may have misspoken. Although,
honestly, I don't know why we want this.

In irqtime_account_process_tick() there's:

if (this_cpu_ksoftirqd() == p) {
/*
* ksoftirqd time do not get accounted in cpu_softirq_time.
* So, we have to handle it separately here.
* Also, p->stime needs to be updated for ksoftirqd.
*/
account_system_index_time(p, cputime, CPUTIME_SOFTIRQ);

Which to me looks like it is counting ksoftirqd for SOFTIRQ time. But
honestly, why do we care about that? What's the difference if ksoftirqd
were to run or softirqd were to pass work off to a workqueue?

ksoftirqd runs in vanilla Linux as SCHED_OTHER. The work it does doesn't
interrupt processes any more than any other kernel thread. I don't know why
we make it "special".

I guess the better question I need to ask is, what is this information used
for? I thought it was how much time was take away from tasks. As current
would be a task, and we do care if a real softirq is running, as we do not
want to add that to the current task accounting.

But for ksoftirqd, that's not the case, and I don't really care if it's
running a softirq or not. As that time isn't interrupting actual tasks. Not
to mention, one could simply look at the ksoftirqd tasks to see how much
time they take up.

>
> >
> > > So my point is if we do not align ktimers, ktimers would act like
> > > **observation on *not-excluded ksoftirq patched* kernel** part in the
> > > above example,
> > > and this might make SOFTIRQ less than expected, /proc/stat less accurate.
> >
> > No it does not. When a softirq kicks off it's work to a thread (ksoftirq,
> > threaded softirqd, or simply a workqueue), it's no longer running in
> > softirq context, and should not be measured as such.
> >
> > The measurement is not about how much work the softirq is doing (otherwise
> > we need to add workqueues started by softirqs too), it's about measuring
> > the actual irq and softirq context. In PREEMPT_RT, we try to eliminate that
> > context as much as possible.
>
> Thanks anyway, I think I begin to learn a bit more about PREEMPT_RT...

Well, I may have been wrong in this information.

I would like to know who is using this information and for what purpose.
Ideally, I would love to make ksoftirqd *not* report its time as handling
SOFTIRQ.

-- Steve