Re: [PATCH v3] sched/cputime: let ktimers align with ksoftirqd in accounting CPUTIME_SOFTIRQ

From: Yuanhan Zhang
Date: Thu Dec 07 2023 - 05:44:07 EST


Hi,

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> 于2023年12月5日周二 23:31写道:
>
> On 2023-12-02 05:28:15 [-0500], Yuanhan Zhang wrote:
> > Hi,
> Hi,
>
> > Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> 于2023年12月1日周五 11:16写道:
> > >
> > > On 2023-12-01 16:05:41 [+0800], tiozhang wrote:
> > > > In CONFIG_PREEMPT_RT kernel, ktimers also calls __do_softirq,
> > > > so when accounting CPUTIME_SOFTIRQ, ktimers need to be accounted the same
> > > > as ksoftirqd.
> > >
> > > I still don't understand why this is a good thing and why want to align
> > > it with ksoftirqd and what breaks if we don't.
> >
> > My motivation of doing this is to keep CPUTIME_SOFTIRQ in /proc/stat
> > remaining more accurate in PREEPT_RT kernel.
> >
> > If we dont align, ktimers' cpu time is added to CPUTIME_SYSTEM when
> > CONFIG_IRQ_TIME_ACCOUNTING is enabled, make our stats less accurate..
>
> So it is only SYSTEM? There is no additional SOFTIRQ that is used?

Yes, there is some SOFTIRQ, but less than expected.
Here is my little test to explain this, in a reverse way:

I patch my kernel with ksoftirqd not-excluded, like this:

/**/
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 0796f938c4f0..6de9eccd094b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c

@@ -67,7 +67,7 @@ void irqtime_account_irq(struct task_struct *curr)
*/
if (hardirq_count())
irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
- else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+ else if (in_serving_softirq())
irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
}
EXPORT_SYMBOL_GPL(irqtime_account_irq);

@@ -371,14 +371,7 @@ static void irqtime_account_process_tick(struct
task_struct *p, int user_tick,

cputime -= other;

- if (this_cpu_ksoftirqd() == p) {
- /*
- * ksoftirqd time do not get accounted in cpu_softirq_time.
- * So, we have to handle it separately here.
- * Also, p->stime needs to be updated for ksoftirqd.
- */
- account_system_index_time(p, cputime, CPUTIME_SOFTIRQ);
- } else if (user_tick) {
+ if (user_tick) {
account_user_time(p, cputime);
} else if (p == rq->idle) {
account_idle_time(cputime);
/**/

Then I do the following test between patched and unpatched kernels on
the same machine:
1. set loopback rps&xps on cpu0
2. test it with iperf like: `iperf -c 127.0.0.1 -P 1 -t 999 -i 1`, making some
softirqs happening on [ksoftirqd/0].
3. observe delta per second on /proc/stat

observation on *not-excluded ksoftirq patched* kernel:
------------------------------------------------------------------------
// SOFTIRQ stats
root@ubuntu-1804-live-x86-tiozhang:~#
while true;do sleep 1;curr=`cat /proc/stat | grep cpu0 | awk '{print
$8}'`;echo $((curr-last));last=$curr;done
...
13
14
13
13
13
13
14
13
13
13
...
// SYSTEM stats
root@ubuntu-1804-live-x86-tiozhang:~#
while true;do sleep 1;curr=`cat /proc/stat | grep cpu0 | awk '{print
$4}'`;echo $((curr-last));last=$curr;done
...
75
78
76
79
77
77
78
76
76
77
78
77
...

===================================================================================

observation on *normal* kernel:
------------------------------------------------------------------------
// SOFTIRQ stats
root@ubuntu-1804-live-x86-tiozhang-2:~#
while true;do sleep 1;curr=`cat /proc/stat | grep cpu0 | awk '{print
$8}'`;echo $((curr-last));last=$curr;done
...
18
20
22
17
23
18
18
20
17
20
18
18
22
18
22
...
// SYSTEM stats
root@ubuntu-1804-live-x86-tiozhang-2:~#
while true;do sleep 1;curr=`cat /proc/stat | grep cpu0 | awk '{print
$4}'`;echo $((curr-last));last=$curr;done
...
72
74
73
74
72
69
74
76
73
73
74
73
72
73
73
...

It results in if we do not handle ksoftirqd like this, we will have a
bigger SYSTEM and less SOFTIRQ.
So my point is if we do not align ktimers, ktimers would act like
**observation on *not-excluded ksoftirq patched* kernel** part in the
above example,
and this might make SOFTIRQ less than expected, /proc/stat less accurate.

>
> > The main diff between ksoftirqd and force-threaded interrupt is that ksoftirq
> > is in SOFTIRQ_OFFSET (serving softirqs) while force-threaded is in
> > SOFTIRQ_DISABLE_OFFSET (by using local_disable_bh).
>
> This depends. If you look at a network driver, SOFTIRQ_DISABLE_OFFSET is
> used during the driver routine doing almost only schedule a softirq.
> Then the main part happens during SOFTIRQ_OFFSET where the driver pulls
> the packets and passes them to the network stack.
>
> > CPUTIME_SOFTIRQ serves for time in SOFTIRQ_OFFSET (processing softirqs).
> > See
> > https://lore.kernel.org/all/1285619753-10892-1-git-send-email-venki@xxxxxxxxxx/
>
> Let me look at this later…
>
> > So this leads to ksoftirqd is counted into CPUTIME_SOFTIRQ but irq-threads
> > into CPUTIME_SYSTEM.
> >
> > Since ktimers is also in SOFTIRQ_OFFSET, align it with ksoftirq will
> > put it into CPUTIME_SOFTIRQ, making /proc/stat more accurate.
>
> But this only works if it is observed during the TICK interrupt, right?

This may take a long time to read, thanks again for your time and patience.

>
> Sebastian