Re: [PATCH] PCI/IOV: update num_VFs earlier

From: Bjorn Helgaas
Date: Fri Apr 05 2019 - 18:33:18 EST


On Fri, Mar 29, 2019 at 09:00:58AM +0100, Pierre Crégut wrote:
> Ensure that iov->num_VFs is set before a netlink message is sent
> when the number of VFs is changed. Only the path for num_VFs > 0
> is affected. The path for num_VFs = 0 is already correct.
>
> Monitoring programs can relie on netlink messages to track interface
> change and query their state in /sys. But when sriov_numvfs is set to a
> positive value, the netlink message is sent before the value is available
> in sysfs. The value read after the message is received is always zero.

Thanks, Pierre! Can you clue me in on where exactly the connection
from sriov_enable() to netlink is?

I see one side of the race is with sriov_numvfs_show(), but I don't
know where the netlink message is sent. Is that connected with the
kobject_uevent(KOBJ_CHANGE)?

One thing this would help with is figuring out exactly how *much*
earlier we need to set iov->num_VFs. It looks like the current patch
sets it before we actually enable the VFs, so a user could read
/sys/.../sriov_numvfs and get the wrong value. Of course, that's
unavoidable; the question is whether it's OK to get the new value
*before* it actually takes effect, or whether we want to return a
stale value until after it takes effect.

> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202991
> Signed-off-by: Pierre Crégut <pierre.cregut@xxxxxxxxxx>
> ---
> note: the behaviour can be tested with the following shell script also
> available on the bugzilla (d being the phy device name):
>
> ip monitor dev $d | grep --line-buffered "^[0-9]*:" | \
> while read line; do cat /sys/class/net/$d/device/sriov_numvfs; done
>
> drivers/pci/iov.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 3aa115ed3a65..a9655c10e87f 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -351,6 +351,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> goto err_pcibios;
> }
>
> + iov->num_VFs = nr_virtfn;
> pci_iov_set_numvfs(dev, nr_virtfn);
> iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
> pci_cfg_access_lock(dev);
> @@ -363,7 +364,6 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> goto err_pcibios;
>
> kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE);
> - iov->num_VFs = nr_virtfn;
>
> return 0;
>
> @@ -379,6 +379,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> if (iov->link != dev->devfn)
> sysfs_remove_link(&dev->dev.kobj, "dep_link");
>
> + iov->num_VFs = 0;
> pci_iov_set_numvfs(dev, 0);
> return rc;
> }
> --
> 2.17.1
>