Re: [PATCH for v4.9-stable] sched: fix softirq time accounting
From: Greg KH
Date: Wed Dec 13 2017 - 05:54:58 EST
On Wed, Dec 13, 2017 at 11:11:16AM +0100, Rabin Vincent wrote:
> From: Rabin Vincent <rabinv@xxxxxxxx>
>
> softirq time accounting is broken on v4.9.x if ksoftirqd runs.
>
> With
> CONFIG_IRQ_TIME_ACCOUNTING=y
> # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
>
> this test code:
>
> struct tasklet_struct tasklet;
>
> static void delay_tasklet(unsigned long data)
> {
> udelay(10);
> tasklet_schedule(&tasklet);
> }
>
> tasklet_init(&tasklet, delay_tasklet, 0);
> tasklet_schedule(&tasklet);
>
> results in:
>
> $ while :; do grep cpu0 /proc/stat; done
> cpu0 5 0 80 25 16 107 1 0 0 0
> cpu0 5 0 80 25 16 107 0 0 0 0
> cpu0 5 0 80 25 16 107 0 0 0 0
> cpu0 5 0 80 25 16 107 0 0 0 0
> cpu0 5 0 81 25 16 107 0 0 0 0
> cpu0 5 0 81 25 16 107 1 0 0 0
> cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> cpu0 5 0 81 25 16 108 0 0 0 0
> cpu0 6 0 81 25 16 108 0 0 0 0
> cpu0 6 0 81 25 16 108 0 0 0 0
>
> As can be seen, the softirq numbers are totally bogus.
>
> When ksoftirq is running, irqtime_account_process_tick() increments
> cpustat[CPUSTAT_SOFTIRQ]. This causes the "nsecs_to_cputime64(irqtime)
> - cpustat[CPUSTAT_SOFTIRQ]" calculation in irqtime_account_update() to
> underflow the next time a softirq is handled leading to the above
> values.
>
> The underflow bug was added by 57430218317e5b280 ("sched/cputime: Count
> actually elapsed irq & softirq time").
>
> But ksoftirqd accounting was wrong even in earlier kernels. In earlier
> kernels, after a ksoftirq run, the kernel would simply stop accounting
> softirq time spent outside of ksoftirqd until that (accumulated) time
> exceeded the time for which ksofirqd previously had run.
>
> Fix both the underflow and the wrong accounting by using a counter
> specifically for the non-ksoftirqd softirq case.
>
> This code has been fixed in current mainline by a499a5a14db
> ("sched/cputime: Increment kcpustat directly on irqtime account") [note
> also the followup 25e2d8c1b9e327e ("sched/cputime: Fix ksoftirqd cputime
> accounting regression")], but that patch is a part of the many changes
> for eliminating of cputime_t so it does not seem suitable for backport.
I _really_ only want to take the exact upstream patches, as every time
we do something like what you are proposing to do here, we get it wrong.
Seriously, our track record is horrible. Like 90% wrong.
Can you try just those two patches instead?
thanks,
greg k-h