Re: [PATCH] FIXUP: genirq: defuse spurious-irq timebomb
From: Thomas Gleixner
Date: Sun Jul 07 2024 - 14:39:42 EST
Pete!
On Fri, Jun 14 2024 at 21:42, Pete Swain wrote:
> The flapping-irq detector still has a timebomb.
>
> A pathological workload, or test script,
> can arm the spurious-irq timebomb described in
> 4f27c00bf80f ("Improve behaviour of spurious IRQ detect")
>
> This leads to irqs being moved the much slower polled mode,
> despite the actual unhandled-irq rate being well under the
> 99.9k/100k threshold that the code appears to check.
>
> How?
> - Queued completion handler, like nvme, servicing events
> as they appear in the queue, even if the irq corresponding
> to the event has not yet been seen.
>
> - queues frequently empty, so seeing "spurious" irqs
> whenever the last events of a threaded handler's
> while (events_queued()) process_them();
> ends with those events' irqs posted while thread was scanning.
> In this case the while() has consumed last event(s),
> so next handler says IRQ_NONE.
>
> - In each run of "unhandled" irqs, exactly one IRQ_NONE response
> is promoted from IRQ_NONE to IRQ_HANDLED, by note_interrupt()'s
> SPURIOUS_DEFERRED logic.
>
> - Any 2+ unhandled-irq runs will increment irqs_unhandled.
> The time_after() check in note_interrupt() resets irqs_unhandled
> to 1 after an idle period, but if irqs are never spaced more
> than HZ/10 apart, irqs_unhandled keeps growing.
>
> - During processing of long completion queues, the non-threaded
> handlers will return IRQ_WAKE_THREAD, for potentially thousands
> of per-event irqs. These bypass note_interrupt()'s irq_count++ logic,
> so do not count as handled, and do not invoke the flapping-irq
> logic.
>
> - When the _counted_ irq_count reaches the 100k threshold,
> it's possible for irqs_unhandled > 99.9k to force a move
> to polling mode, even though many millions of _WAKE_THREAD
> irqs have been handled without being counted.
>
> Solution: include IRQ_WAKE_THREAD events in irq_count.
> Only when IRQ_NONE responses outweigh (IRQ_HANDLED + IRQ_WAKE_THREAD)
> by the old 99:1 ratio will an irq be moved to polling mode.
Nice detective work. Though I'm not entirely sure whether that's the
correct approach as it might misjudge the situation where
IRQ_WAKE_THREAD is issued but the thread does not make progress at all.
Let me think about it some more.
Thanks,
tglx