Re: [PATCH v3] xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled

From: Jason Andryuk
Date: Wed Nov 30 2022 - 14:40:22 EST


On Mon, Nov 28, 2022 at 8:44 AM Marek Marczykowski-Górecki
<marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Nov 21, 2022 at 05:16:37PM +0100, Marek Marczykowski-Górecki wrote:
> > On Mon, Nov 21, 2022 at 10:41:34AM -0500, Jason Andryuk wrote:
> > > On Sat, Nov 19, 2022 at 11:33 AM Marek Marczykowski-Górecki
> > > <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Sat, Nov 19, 2022 at 09:36:54AM -0500, Jason Andryuk wrote:
> > > > > Hi, Marek,
> > > > >
> > > > > On Fri, Nov 18, 2022 at 4:46 PM Marek Marczykowski-Górecki
> > > > > <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Fri, Nov 18, 2022 at 03:46:47PM -0500, Jason Andryuk wrote:
> > > > > > > I was trying to test your xen-pciback v3 patch, and I am having
> > > > > > > assignment fail consistently now. It is actually failing to
> > > > > > > quarantine to domIO in the first place, which matches the failure from
> > > > > > > the other day (when I more carefully read through the logs). It now
> > > > > > > consistently fails to quarantine on every boot unlike the other day
> > > > > > > where it happened once.
> > > > > >
> > > > > > Does this include the very first assignment too, or only after domain
> > > > > > reboot? If the latter, maybe some cleanup missed clearing MASKALL?
> > > > >
> > > > > It's the quarantine during dom0 boot that fails. Later assignment
> > > > > during VM boot fails. I tried warm reboots and cold boots and it
> > > > > happened both times.
> > > > >
> > > > > I also modified my initrd to halt in there and checked the config
> > > > > space. MASKALL wasn't set at that time. I need to double check -
> > > > > MASKALL may have been unset after dom0 booted in that case.
> > > > >
> > > > > I'll test more to figure when and how MASKALL is getting set.
> > >
> > > I'm testing with a laptop without a battery. It seems MASKALL remains
> > > set when rebooting or when left plugged in.
> > >
> > > From unplugged, a cold boot doesn't have MASKALL set and the network vm boots.
> > >
> > > After that, rebooting the laptop leaves MASKALL set on the NIC when
> > > the laptop reboots. NIC assignment fails.
> > >
> > > Shutdown and later boot while left plugged in keeps MASKALL set. NIC
> > > assignment fails. I have only tested this scenario for short periods
> > > of time, so I don't know if it would eventually clear after a longer
> > > time.
> >
> > That's interesting, seems like firmware is not resetting the device
> > properly. Maybe related to enabled wake on lan?
> >
> > Anyway, resetting the device at domain create/destroy is AFAIR normally
> > done by pciback (at the instruction by the toolstack). Should it maybe
> > be done when assigning to pciback initially too? Or maybe in this
> > specific case, device reset doesn't properly clear MASKALL, so pciback
> > should clear it explicitly (after ensuring the MSI-X enable is cleared
> > too)?
>
> Can you check if `echo 1 > /sys/bus/pci/devices/$SBDF/reset` clears
> MASKALL on this device?

`echo 1 > ..../reset` did not clear MASKALL.

After shutting down the domain with the iwlwifi card, lspci from dom0 shows:
MSI-X: Enable+ Count=16 Masked+

Hmm, Xen logged:
(XEN) cannot disable IRQ 137: masking MSI-X on 0000:00:14.3

Oh, looking back, I see that was logged during my earlier testing of
this patch set, but I missed it.

It seems like Xen set Enable and Masked itself in __pci_disable_msix()
since memory decoding is not enabled.

I'm still investigating, but I wanted to give an update. It seems
like Xen should clear MASKALL when booting. Something like clearing
MASKALL in pdev_msi_init() when !ENABLE & MASKALL. However, I have
seen the system boot with both Enable and Maskall set on the iwlwifi
nic. Is it risky to just unilaterally clear both of those when
enumerating PCI devices? It doesn't seem appropriate to leave them
set without a driver controlling them.

-Jason