Re: [PATCH] interrupt: discover and disable very frequent interrupts
From: Marc Zyngier
Date: Fri Sep 30 2022 - 05:24:10 EST
On Fri, 30 Sep 2022 07:40:42 +0100,
Zhang Xincheng <zhangxincheng@xxxxxxxxxxxxx> wrote:
>
> From: zhangxincheng <zhangxincheng@xxxxxxxxxxxxx>
>
> In some cases, a peripheral's interrupt will be triggered frequently,
> which will keep the CPU processing the interrupt and eventually cause
> the RCU to report rcu_sched self-detected stall on the CPU.
>
> [ 838.131628] rcu: INFO: rcu_sched self-detected stall on CPU
> [ 838.137189] rcu: 0-....: (194839 ticks this GP) idle=f02/1/0x4000000000000004
> softirq=9993/9993 fqs=97428
> [ 838.146912] rcu: (t=195015 jiffies g=6773 q=0)
> [ 838.151516] Task dump for CPU 0:
> [ 838.154730] systemd-sleep R running task 0 3445 1 0x0000000a
>
> Signed-off-by: zhangxincheng <zhangxincheng@xxxxxxxxxxxxx>
> Change-Id: I9c92146f2772eae383c16c8c10de028b91e07150
> Signed-off-by: zhangxincheng <zhangxincheng@xxxxxxxxxxxxx>
Irrespective of the patch itself, I would really like to understand
why you consider that it is a better course of action to kill a device
(and potentially the whole machine) than to let the storm eventually
calm down? A frequent interrupt is not necessarily the sign of
something going wrong. It is the sign of a busy system. I prefer my
systems busy rather than dead.
Furthermore, I see no rationale here about the number of interrupt
that *you* consider as being "too many" over what period of time (it
seems to me that both parameters are firmly hardcoded).
Something like this should be limited to a debug feature. It would
also be a lot more useful if it was built as an interrupt *limiting*
feature, rather then killing the interrupt forever (which is IMHO a
ludicrous thing to do).
Thanks,
M.
--
Without deviation from the norm, progress is not possible.