Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success

From: Salvatore Bonaccorso
Date: Wed Apr 27 2022 - 03:59:19 EST


Hi,

On Mon, Mar 14, 2022 at 09:29:53PM +0100, Jeremi Piotrowski wrote:
> On Mon, Mar 14, 2022 at 01:04:55PM -0400, Dusty Mabe wrote:
> >
> >
> > On 3/14/22 12:49, Stefan Roese wrote:
> >
> > > I've added Dusty to Cc, as he (and others) already have been dealing
> > > with this issue AFAICT.
> > >
> > > Dusty, could you perhaps chime in with the latest status? AFAIU, it's
> > > related to potential issues with the Xen version used on these systems?
> >
> > Thanks Stefan,
> >
> > Yes. My understanding is that the issue is because AWS is using older versions
> > of Xen. They are in the process of updating their fleet to a newer version of
> > Xen so the change introduced with Stefan's commit isn't an issue any longer.
> >
> > I think the changes are scheduled to be completed in the next 10-12 weeks. For
> > now we are carrying a revert in the Fedora Kernel.
> >
> > You can follow this Fedora CoreOS issue if you'd like to know more about when
> > the change lands in their backend. We work closely with one of their partner
> > engineers and he keeps us updated. https://github.com/coreos/fedora-coreos-tracker/issues/1066
> >
> > Dusty
>
> Thanks for the link and explanation. What a fun coincidence that we hit this in
> Flatcar Container Linux as well. We've reverted the commit in our kernels for
> the time being, and will track that issue.

Does someone knows here on current state of the AWS instances using
the older Xen version which causes the issues?

AFAIU upstream is not planning to revert 83dbf898a2d4 ("PCI/MSI: Mask
MSI-X vectors only on success") as it fixed a bug. Now several
downstream distros do carry a revert of this commit, which I believe
is an unfortunate situation and wonder if this can be addressed
upstream to deal with the AWS m4.large instance issues.

Regards,
Salvatore