Re: [PATCH] net/mlx5: poll mlx5 eq during irq migration
From: Jason Gunthorpe
Date: Wed Mar 04 2026 - 15:13:21 EST
On Wed, Mar 04, 2026 at 04:17:04PM +0000, Praveen Kumar Kannoju wrote:
> Interrupt lost scenario has been observed in multiple issues during IRQ
> migration due to cpu scaling activity. This further led to the presence of
> unhandled EQE's causing corresponding Mellanox transmission queues to
> become full and get timedout. This patch overcomes this situation by
> polling the EQ associated with the IRQ which undergoes migration, to
> recover any unhandled EQE's and keep the transmission uninterrupted from
> the corresponding queue.
What? This does not seem like something we should do like this.
IRQ migration is not supposed to loose interrupts, this seems like a
IRQ layer bug to me. If it is buggy and loosing interrupts it should
probably inject a spurious interrupt around these events so all
devices can be enjoy the bug fix.
Basically you need to explain with alot more detail why the IRQ was
lost, not just some hand wavey "migration something something"..
BTW there are known bugs in things like qemu that can loose interrupts
around changes to the MSI (and worse than that too), but I thought
they were all fixed now?
Jason