Re: [RFC 2/3] vfio/pci: virtualize PME related registers bits and initialize to zero

From: Alex Williamson
Date: Wed Nov 17 2021 - 12:53:17 EST


On Mon, 15 Nov 2021 19:06:39 +0530
<abhsahu@xxxxxxxxxx> wrote:

> From: Abhishek Sahu <abhsahu@xxxxxxxxxx>
>
> If any PME event will be generated by PCI, then it will be mostly
> handled in the host by the root port PME code. For example, in the case
> of PCIe, the PME event will be sent to the root port and then the PME
> interrupt will be generated. This will be handled in
> drivers/pci/pcie/pme.c at the host side. Inside this, the
> pci_check_pme_status() will be called where PME_Status and PME_En bits
> will be cleared. So, the guest OS which is using vfio-pci device will
> not come to know about this PME event.
>
> To handle these PME events inside guests, we need some framework so
> that if any PME events will happen, then it needs to be forwarded to
> virtual machine monitor. We can virtualize PME related registers bits
> and initialize these bits to zero so vfio-pci device user will assume
> that it is not capable of asserting the PME# signal from any power state.
>
> Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx>
> ---
> drivers/vfio/pci/vfio_pci_config.c | 32 +++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index 6e58b4bf7a60..fb3a503a5b99 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -738,12 +738,27 @@ static int __init init_pci_cap_pm_perm(struct perm_bits *perm)
> */
> p_setb(perm, PCI_CAP_LIST_NEXT, (u8)ALL_VIRT, NO_WRITE);
>
> + /*
> + * The guests can't process PME events. If any PME event will be
> + * generated, then it will be mostly handled in the host and the
> + * host will clear the PME_STATUS. So virtualize PME_Support bits.
> + * It will be initialized to zero later on.
> + */
> + p_setw(perm, PCI_PM_PMC, PCI_PM_CAP_PME_MASK, NO_WRITE);
> +
> /*
> * Power management is defined *per function*, so we can let
> * the user change power state, but we trap and initiate the
> * change ourselves, so the state bits are read-only.
> + *
> + * The guest can't process PME from D3cold so virtualize PME_Status
> + * and PME_En bits. It will be initialized to zero later on.
> */
> - p_setd(perm, PCI_PM_CTRL, NO_VIRT, ~PCI_PM_CTRL_STATE_MASK);
> + p_setd(perm, PCI_PM_CTRL,
> + PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS,
> + ~(PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS |
> + PCI_PM_CTRL_STATE_MASK));
> +
> return 0;
> }
>
> @@ -1412,6 +1427,18 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo
> return 0;
> }
>
> +static void vfio_update_pm_vconfig_bytes(struct vfio_pci_core_device *vdev,
> + int offset)
> +{
> + /* initialize virtualized PME_Support bits to zero */
> + *(__le16 *)&vdev->vconfig[offset + PCI_PM_PMC] &=
> + ~cpu_to_le16(PCI_PM_CAP_PME_MASK);
> +
> + /* initialize virtualized PME_Status and PME_En bits to zero */

^ Extra space here and above.


> + *(__le16 *)&vdev->vconfig[offset + PCI_PM_CTRL] &=
> + ~cpu_to_le16(PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS);

Perhaps more readable and consistent with elsewhere as:

__le16 *pmc = (__le16 *)&vdev->vconfig[offset + PCI_PM_PMC];
__le16 *ctrl = (__le16 *)&vdev->vconfig[offset + PCI_PM_CTRL];

/* Clear vconfig PME_Support, PME_Status, and PME_En bits */
*pmc &= ~cpu_to_le16(PCI_PM_CAP_PME_MASK);
*ctrl &= ~cpu_to_le16(PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS);

Thanks,
Alex

> +}
> +
> static int vfio_fill_vconfig_bytes(struct vfio_pci_core_device *vdev,
> int offset, int size)
> {
> @@ -1535,6 +1562,9 @@ static int vfio_cap_init(struct vfio_pci_core_device *vdev)
> if (ret)
> return ret;
>
> + if (cap == PCI_CAP_ID_PM)
> + vfio_update_pm_vconfig_bytes(vdev, pos);
> +
> prev = &vdev->vconfig[pos + PCI_CAP_LIST_NEXT];
> pos = next;
> caps++;