Re: [POSSIBLE BUG] behavior change in irq_can_handle_pm() introduced in 8d39d6ec4db5d
From: Luigi Rizzo
Date: Mon Nov 10 2025 - 19:26:48 EST
On Sat, Nov 8, 2025 at 10:30 PM Luigi Rizzo <lrizzo@xxxxxxxxxx> wrote:
>
> BACKGROUND (just to explain how I found the issue; it may exist regardless):
>
> I have some code (soon to be posted here) to implement interrupt moderation
> in software using using per-CPU hrtimers. The basic logic is the following:
>
> - if the system decides an irq needs moderation, it calls disable_irq_nosync(),
> adds the irq_desc in a per-cpu list, and keeps IRQD_IRQ_INPROGRESS set
> to prevent migration. The first desc inserted in the list also start
> an hrtimer;
>
> - when the timer fires, the callback clears the bit and calls enable_irq()
> on all linked irq_desc's
>
> The relevant code is the following:
>
> @@ -207,x +208,x @@ irqreturn_t handle_irq_event(struct irq_desc *desc)
>
> raw_spin_lock(&desc->lock);
> + /* if moderation kicks in, disable_irq_nosync() and set an
> hrtimer. Keep the bit set to prevent migration */
> + if (irq_moderation_has_started_timer_and_disabled_irq(desc))
> + return ret;
> irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS);
> return ret;
...
after further debugging, I found that the problem is that disable_irq_nosync()
operates lazily. It marks the interrupt as disabled but leaves it on, acting on
the chip only at the next interrupt. With this change
8d39d6ec4db5d genirq: Prevent migration live lock in handle_edge_irq()
the next interrupt will find IRQD_IRQ_INPROGRESS set, and block
until the flag is clear, but that could only happen if the timer handler were
allowed to run on the same CPU.
I guess the problem can be avoided by calling
irq_set_status_flags(irq, IRQ_DISABLE_UNLAZY);
on the interrupts where I want to use the my changes in handle_irq_event()
However I still wonder if the change of behavior is intentional or an undesired
side effect
thanks
luigi