Re: [PATCH] net/mlx5: poll mlx5 eq during irq migration

From: Jason Gunthorpe

Date: Thu Mar 05 2026 - 19:32:33 EST


On Thu, Mar 05, 2026 at 05:08:52PM +0000, Praveen Kannoju wrote:

> Regardless of the underlying causes, which may include IRQ loss
> or EQ re-arming failure, the TX queue becomes stuck, and the
> timeout handler is only triggered once the queue is declared
> full. In scenarios where only specialized packets, such as
> heartbeat packets, are sent through the queue, it takes
> significantly longer for the queue to fill and be identified as
> stuck. A proven solution for this issue is polling the EQ
> immediately after the corresponding IRQ migration, which allows
> for earlier recovery and prevents the transmission queue from
> becoming stuck.

I undersand all of this, but for upstreaming we want the root cause,
not bodges like this.

There is no reason to do what this patch does, the IRQ system is not
supposed to loose interrupts on migration, if that is happening on
your systems it is a serious bug that must be root caused.

Jason