[RFC 2/3] vfio/pci: virtualize PME related registers bits and initialize to zero

From: abhsahu
Date: Mon Nov 15 2021 - 08:37:44 EST


From: Abhishek Sahu <abhsahu@xxxxxxxxxx>

If any PME event will be generated by PCI, then it will be mostly
handled in the host by the root port PME code. For example, in the case
of PCIe, the PME event will be sent to the root port and then the PME
interrupt will be generated. This will be handled in
drivers/pci/pcie/pme.c at the host side. Inside this, the
pci_check_pme_status() will be called where PME_Status and PME_En bits
will be cleared. So, the guest OS which is using vfio-pci device will
not come to know about this PME event.

To handle these PME events inside guests, we need some framework so
that if any PME events will happen, then it needs to be forwarded to
virtual machine monitor. We can virtualize PME related registers bits
and initialize these bits to zero so vfio-pci device user will assume
that it is not capable of asserting the PME# signal from any power state.

Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx>
---
drivers/vfio/pci/vfio_pci_config.c | 32 +++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 6e58b4bf7a60..fb3a503a5b99 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -738,12 +738,27 @@ static int __init init_pci_cap_pm_perm(struct perm_bits *perm)
*/
p_setb(perm, PCI_CAP_LIST_NEXT, (u8)ALL_VIRT, NO_WRITE);

+ /*
+ * The guests can't process PME events. If any PME event will be
+ * generated, then it will be mostly handled in the host and the
+ * host will clear the PME_STATUS. So virtualize PME_Support bits.
+ * It will be initialized to zero later on.
+ */
+ p_setw(perm, PCI_PM_PMC, PCI_PM_CAP_PME_MASK, NO_WRITE);
+
/*
* Power management is defined *per function*, so we can let
* the user change power state, but we trap and initiate the
* change ourselves, so the state bits are read-only.
+ *
+ * The guest can't process PME from D3cold so virtualize PME_Status
+ * and PME_En bits. It will be initialized to zero later on.
*/
- p_setd(perm, PCI_PM_CTRL, NO_VIRT, ~PCI_PM_CTRL_STATE_MASK);
+ p_setd(perm, PCI_PM_CTRL,
+ PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS,
+ ~(PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS |
+ PCI_PM_CTRL_STATE_MASK));
+
return 0;
}

@@ -1412,6 +1427,18 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo
return 0;
}

+static void vfio_update_pm_vconfig_bytes(struct vfio_pci_core_device *vdev,
+ int offset)
+{
+ /* initialize virtualized PME_Support bits to zero */
+ *(__le16 *)&vdev->vconfig[offset + PCI_PM_PMC] &=
+ ~cpu_to_le16(PCI_PM_CAP_PME_MASK);
+
+ /* initialize virtualized PME_Status and PME_En bits to zero */
+ *(__le16 *)&vdev->vconfig[offset + PCI_PM_CTRL] &=
+ ~cpu_to_le16(PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS);
+}
+
static int vfio_fill_vconfig_bytes(struct vfio_pci_core_device *vdev,
int offset, int size)
{
@@ -1535,6 +1562,9 @@ static int vfio_cap_init(struct vfio_pci_core_device *vdev)
if (ret)
return ret;

+ if (cap == PCI_CAP_ID_PM)
+ vfio_update_pm_vconfig_bytes(vdev, pos);
+
prev = &vdev->vconfig[pos + PCI_CAP_LIST_NEXT];
pos = next;
caps++;
--
2.17.1