Re: [PATCH 0/1] net/mlx5: Call pci_disable_sriov() on remove

From: Niklas Schnelle
Date: Thu Apr 30 2020 - 16:25:26 EST




On 4/30/20 6:13 PM, Saeed Mahameed wrote:
> On Thu, 2020-04-30 at 14:03 +0200, Niklas Schnelle wrote:
>> Hello,
>>
>> I'm currently working on improvements in PF-VF handling on s390. One
>> thing that
>> may be a bit special for us is that the s390 hotplug code supports
>> shutting
>> down a single PF of a multi-function device such as a ConnectX-5.
>> During testing I found that the mlx5_core driver does not call
>> pci_disable_sriov() in its remove handler which is called on the PF
>> via
>> pci_stop_and_remove_bus_device() on an orderly hot unplug.
>
> Actually what happens if you call pci_disable_sriov() when there are
> VFs attached to VMs ?
So the scenario I looked at was that the admin does a

echo 0 > /sys/bus/pci/slots/<pfslot>/power

for the relevant PFs (Note that on our systems
PFs of a single card could be passed through to different LPARs).
And then physically removes the adapter or moves the PF(s) to a different LPAR.
In this scenario the orderly shutdown would hopefully help QEMU/libvirt to
trigger whatever orderly shutdown it can do but that testing is still on my todo list.
I assumed that because the pci_disable_sriov() call is basically the only
thing the example shows for the remove case it would trigger the intended actions
in the common code including for vfio-pci, udev events and so on.
>
>>
>> Following is a patch to fix this, I want to point out however that
>> I'm not
>> entirely sure about the effect of clear_vfs that distinguishes
>> mlx5_sriov_detach() from mlx5_sriov_disable() only that the former is
>> called
>> from the remove handler and it doesn't call pci_disable_sriov().
>> That however is required according to Documentation/PCI/pci-iov-
>> howto.rst
>>
>
> The Doc doesn't say "required", so the question is, is it really
> required ?
Yes but as I wrote above it is about the only thing the example shows
for the removal procedure so there is definitely a clash between
what the mlx5 driver does and what the documentation suggests.
So I think even though I agree my patch definitely wasn't ready, this
is surely worth thinking about for a second or two.
>
> because the way i see it, it is the admin responsibility to clear out
> vfs before shutting down the PF driver.
>
> as explained in the patch review, mlx5 vf driver is resilient of such
> behavior and once the device is back online the vf driver quickly
> recovers, so it is actually a feature and not a bug :)
>
> There are many reasons why the admin would want to do this:
>
> 1) Fast Firmware upgrade
> 2) Fast Hyper-visor upgrade/refresh
> 3) PF debugging
>
> So you can quickly reset the PF driver without the need to wait for all
> VFs or manually detach-attach them.
So the behavior I was seeing that triggered looking into this is that the VF drivers
weren't just spewing error messages. After hitting some relatively small timeout
the VFs would then start to disappear (maybe 20 seconds or so) and in my testing
I didn't manage to reactivate them.
That might have been due to some error on my part though and I didn't try very
hard because I did not assume that this should work. So thank you for
pointing out that something like that is actually a scenario you have in mind.
Now that I know that I'll definitely look into this a bit more.

So thank you for your input I'm still not sure what the right approach for the
shutdown really is but this is certainly interesting.
>
>
>> I've only tested this on top of my currently still internal PF-VF
>> rework so
>> I might also be totally missing something here in that case excuse my
>> ignorance.
>>
>> Best regards,
>> Niklas Schnelle
>>
>> Niklas Schnelle (1):
>> net/mlx5: Call pci_disable_sriov() on remove
>>
>> drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>