Re: [PATCH v3] sched/cputime: let ktimers align with ksoftirqd in accounting CPUTIME_SOFTIRQ

From: Yuanhan Zhang
Date: Thu Dec 07 2023 - 12:19:48 EST


Hi,

Steven Rostedt <rostedt@xxxxxxxxxxx> 于2023年12月7日周四 10:35写道:
>
> On Thu, 7 Dec 2023 18:43:47 +0800
> Yuanhan Zhang <zyhtheonly@xxxxxxxxx> wrote:
>
> > It results in if we do not handle ksoftirqd like this, we will have a
> > bigger SYSTEM and less SOFTIRQ.
>
> And honestly that's what we want. Interrupts and softirqs that execute in
> interrupts and softirq context take away from the system. That is, if they
> are not explicitly blocked (local_irq_disable/local_bh_disable) they
> interrupt the current task and take up the time of the current task. We
> need to differentiate this because this context has no "task" context to
> measure.
>
> We do not want to add ksoftirqd or threaded interrupt handlers / softirqs
> to this measurement. Sure, they are handling interrupt and softirq code,
> but they have their own context that can be measured like any other task.
>
> If we blur this with real irqs and softirqs, then we will not know what
> those real irqs and softirqs are measuring.

Yes you say clearly enough and it makes some sense to me!
So my understanding is that in PREEMPT_RT, it is better to put ksoftirqd's time
into SYSTEM since they are just in their "task" context.

If my understanding is right, how about we just exclude ksoftirqd
like what I do in the last email?
(something like
else if (in_serving_softirq() &&
(IS_ENABLED(CONFIG_PREEMPT_RT) || curr != this_cpu_ksoftirqd()))

If this looks okay to you, I'm happy to send a new patch for this :)

>
> > So my point is if we do not align ktimers, ktimers would act like
> > **observation on *not-excluded ksoftirq patched* kernel** part in the
> > above example,
> > and this might make SOFTIRQ less than expected, /proc/stat less accurate.
>
> No it does not. When a softirq kicks off it's work to a thread (ksoftirq,
> threaded softirqd, or simply a workqueue), it's no longer running in
> softirq context, and should not be measured as such.
>
> The measurement is not about how much work the softirq is doing (otherwise
> we need to add workqueues started by softirqs too), it's about measuring
> the actual irq and softirq context. In PREEMPT_RT, we try to eliminate that
> context as much as possible.

Thanks anyway, I think I begin to learn a bit more about PREEMPT_RT...

>
> So seeing less is a feature not a bug!
>
> -- Steve