Re: [PATCH] net: enetc: fix sirq-storm by clearing IDR registers

From: Vladimir Oltean

Date: Mon Feb 23 2026 - 11:36:58 EST


Hi Zefir,

On Fri, Feb 20, 2026 at 02:29:30PM +0100, Zefir Kurtisi wrote:
> From: Zefir Kurtisi <zefir.kurtisi@xxxxxxxxxxxx>
>
> The fsl_enetc driver experiences soft-IRQ storms on LS1028A systems
> where up to 500k interrupts/sec are generated, completely saturating
> one CPU core. When running with a single core, this causes watchdog
> timeouts and system reboots.
>
> Root cause:
> The driver was writing to SITXIDR/SIRXIDR (Station Interface summary
> registers) to acknowledge interrupts, but these are W1C registers that
> only provide a summary view. According to the LS1028A Reference Manual
> (Rev. 0, Chapter 16.3):
>
> - TBaIDR/RBaIDR (per-ring, offset 0xa4): RO, "Reading will
> automatically clear all events"
> - SITXIDR/SIRXIDR (summary, offset 0xa18/0xa28): W1C, "provides a
> non-destructive read access"
>
> The actual interrupt sources are the per-ring TBaIDR/RBaIDR registers.
> The summary registers merely reflect their combined state. Writing to
> SITXIDR/SIRXIDR does not clear the underlying per-ring sources, causing
> the hardware to immediately re-assert the interrupt.
>
> Fix:
> 1. Point ring->idr to per-ring TBaIDR/RBaIDR instead of summary
> registers
> 2. Remove per-packet writes to SITXIDR/SIRXIDR from packet processing
> 3. Read TBaIDR/RBaIDR once per NAPI poll (in enetc_poll) before
> re-enabling interrupts
>
> This properly acknowledges interrupts at the hardware level and
> eliminates the interrupt storm. The optimization of clearing once per
> NAPI poll rather than per packet also reduces register access overhead.
>
> Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
> Tested-on: LS1028A (NXP Layerscape), Linux 6.6.93
> Signed-off-by: Zefir Kurtisi <zefir.kurtisi@xxxxxxxxxxxx>
> ---

Thank you for your patch and for debugging.

I am not sure whether your interpretation of the documentation is
correct. I have asked a colleague familiar with the hardware design and
will come back when I am 100% sure.

Superficially, I believe you may have mixed up the documentation for
SITXIDR/SIRXIDR with PSIIDR/VSIIDR. There, indeed, it says "Summary of
detected interrupts for all transmit rings belonging to the SI (...)
Read only, clear using SITXIDR."

I wonder whether it's possible you are looking at a different issue
instead, completely unrelated to hardirq masking. I notice that stable
tag v6.6.93 is lacking this commit:
https://github.com/torvalds/linux/commit/50bd33f6b392
which is high on my list of suspiciously similar issues in terms of behaviour.

(note: when submitting a patch to mainline net.git main branch, it's a
good idea to also test *on* the net.git main branch, aka
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/)

I also note that I have put prints each time the driver clears the
interrupts by writing to SITXIDR/SIRXIDR, and with various workloads on
eno0/eno2/eno3, not once have I noticed the interrupt to still be pending
in TBaIDR/RBaIDR.

Is there something special about your setup? What interfaces and traffic
pattern are you using?

This patch should be put on hold until it is clear to everybody what is
going on.