Re: [PATCH] net/mlx5: poll mlx5 eq during irq migration

From: Jason Gunthorpe

Date: Wed Mar 11 2026 - 20:36:10 EST


On Sat, Mar 07, 2026 at 05:43:56AM +0000, Praveen Kannoju wrote:

> It had been very challenging to arrive at the cause.
> we went thru many live debug sessions with Nvidia R&D team.
> but we couldn't root cause. This tells why we eventually.
> arrived at this mitigation as this issue is wide spread
> and has been hurting many and many customers in cloud.

It is almost certainly a qemu bug. If you cannot find it, then I
suggest you work around it by having qemu inject a spurious interrupt
around the migration situations.

But make sure you have the already known qemu and kernel bug fixes for
lost interrupts on MSI-X writes...

Jason