RE: [PATCH v2] PCI: hv: Fix hibernation in case interrupts are not re-created

From: Dexuan Cui
Date: Thu Oct 01 2020 - 16:53:17 EST


> From: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>
> Sent: Thursday, October 1, 2020 3:13 AM
> > ...
> > I mean this is a Hyper-V specific problem, so IMO we should fix the
> > pci-hyperv driver rather than change the PCI device drivers, which
> > work perfectly on a physical machine and on other hypervisors.
> > Also it can be difficult or impossible to ask the authors of the
> > aforementioned PCI device drivers to destroy and re-create
> > MSI/MSI-X across hibernation, especially for the out-of-tree driver(s).
>
> Good, so why did you mention PCI drivers in the commit log if they
> are not related to the problem you are fixing ?

I mentioned the names of the PCI device drivers because IMO people
want to know how the issue can reproduce (i.e. which PCI devices
are affected and which are not), so they know how to test this patch.

I'll remove the names of the unaffected PCI device drivers from the
commit log, and only keep the name of the Nvidia GPU drivers (which
are so far the only drivers I have identified that are affected, when
Linux VM runs on Hyper-V and hibernates).

> > > Regardless, this commit log does not provide the information that
> > > it should.
> >
> > Hi Lozenzo, I'm happy to add more info. Can you please let me know
> > what extra info I should provide?
>
> s/Lozenzo/Lorenzo
Sorry! Will fix the typo.

> The info you describe properly below, namely what the _actual_ problem
> is.

I will send v3 with the below info.

> > Here if hv_irq_unmask does not call pci_msi_unmask_irq(), the
> > desc->masked remains "true", so later after hibernation, the MSI
> > interrupt line always reamins masked, which is incorrect.
> >
> > Here the slient failure of hv_irq_unmask() does not matter since all the
> > non-boot CPUs are being offlined (meaning all the devices have been
> > frozen). Note: the correct affinity info is still updated into the
> > irqdata data structure in migrate_one_irq() -> irq_do_set_affinity() ->
> > hv_set_affinity(), so when the VM resumes, hv_pci_resume() ->
> > hv_pci_restore_msi_state() is able to correctly restore the irqs with
> > the correct affinity.
> >
> > I hope the explanation can help clarify things. I understand this is
> > not as natual as tht case that Linux runs on a physical machine, but
> > due to the unique PCI pass-through implementation of Hyper-V, IMO this
> > is the only viable fix for the problem here. BTW, this patch is only
> > confined to the pci-hyperv driver and I believe it can no cause any
> > regression.
>
> Understood, write this in the commit log and I won't nag you any further.

Ok. I treat it as an opportunity to practise and improve my writing :-)

> Side note: this issue is there because the hypcall failure triggers
> an early return from hv_irq_unmask().

Yes.

> Is that early return really correct ?

Good question. IMO it's incorrect, because hv_irq_unmask() is called
when the interrupt is activated for the first time, and when the interrupt's
affinity is being changed. In these cases, we may as well call
pci_msi_unmask_irq() unconditionally, even if the hypercall fails.

BTW, AFAIK, in practice the hypercall only fails in 2 cases:
1. The device is removed when Linux VM has not finished the device's
initialization.
2. In hibernation, the device has been disabled while the generic
hibernation code tries to migrate the interrupt, as I explained.

In the 2 cases, the hypercall returns the same error code
HV_STATUS_INVALID_PARAMETER(0x5).

> Another possibility is just logging the error and let
> hv_irq_unmask() continue and call pci_msi_unmask_irq() in the exit
> path.

This is a good idea. I'll make this change in v3.

> Is there a hypcall return value that you can use to detect fatal vs
> non-fatal (ie hibernation) hypcall failures ?

Unluckily IMO there is not. The spec (v6.0b)'s section 10.5.4 (page 106)
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
does define some return values, but IMO they're not applicable here.

> I was confused by reading the patch since it seemed that you call
> pci_msi_unmask_irq() _only_ while hibernating, which was certainly
> a bug.
>
> Thank you for explaining.
>
> Lorenzo

Thanks for reviewing! I'll post v3. Looking forward to your new comments!

Thanks,
-- Dexuan