RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

From: Zytaruk, Kelly
Date: Sat May 20 2017 - 10:30:04 EST


Collins,

Okay, good to know.
Is there a common solution that can handle all cases?

Thanks,
Kelly

>-----Original Message-----
>From: Cheng, Collins
>Sent: Saturday, May 20, 2017 6:38 AM
>To: Zytaruk, Kelly; Alexander Duyck; Alex Williamson
>Cc: Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>Deucher, Alexander; Yinghai Lu
>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
>incapable platform
>
>Hi Kelly,
>
>This issue also happens in "not SR-IOV capable" SBIOS. It seems some "not SR-IOV
>capable" SBIOS will directly report error in system BIOS boot stage and doesn't
>boot to OS. But other "not SR-IOV capable" SBIOS would not report error and
>boot to Linux.
>
>-Collins Cheng
>
>
>-----Original Message-----
>From: Zytaruk, Kelly
>Sent: Saturday, May 20, 2017 6:28 PM
>To: Cheng, Collins <Collins.Cheng@xxxxxxx>; Alexander Duyck
><alexander.duyck@xxxxxxxxx>; Alex Williamson <alex.williamson@xxxxxxxxxx>
>Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx; linux-
>kernel@xxxxxxxxxxxxxxx; Deucher, Alexander <Alexander.Deucher@xxxxxxx>;
>Yinghai Lu <yinghai@xxxxxxxxxx>
>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
>incapable platform
>
>
>
>>-----Original Message-----
>>From: Cheng, Collins
>>Sent: Saturday, May 20, 2017 12:53 AM
>>To: Alexander Duyck; Alex Williamson
>>Cc: Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx;
>>linux-kernel@xxxxxxxxxxxxxxx; Deucher, Alexander; Zytaruk, Kelly;
>>Yinghai Lu
>>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
>>incapable platform
>>
>>Hi Alex,
>>
>>Yes, I hope kernel can disable SR-IOV and related VF resource
>>allocation if the system BIOS is not SR-IOV capable.
>>
>>Adding the parameter "pci=nosriov" sounds a doable solution, but it
>>would need user to add this parameter manually, right? I think an
>>automatic detection would be better. My patch is trying to auto detect and
>bypass VF resource allocation.
>>
>>
>>-Collins Cheng
>>
>
>Collins, be careful about this. I don't think that this is what we want. If you add
>"pci=nosriov" then you are globally disabling SRIOV for all devices. This is not the
>solution that we are looking for.
>Remember that there are 3 types of SBIOS; "not SR-IOV capable", "SR-IOV
>capable but does not support large resources", "Complete SR-IOV support".
>
>The problem is that we are trying to find a fix for "broken" SBIOS that does
>support SR-IOV but does not support the full SR-IOV capabilities that devices with
>large resources require.
>
>Thanks,
>Kelly
>
>>
>>-----Original Message-----
>>From: Alexander Duyck [mailto:alexander.duyck@xxxxxxxxx]
>>Sent: Friday, May 19, 2017 11:44 PM
>>To: Alex Williamson <alex.williamson@xxxxxxxxxx>
>>Cc: Cheng, Collins <Collins.Cheng@xxxxxxx>; Bjorn Helgaas
>><bhelgaas@xxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx; linux-
>>kernel@xxxxxxxxxxxxxxx; Deucher, Alexander <Alexander.Deucher@xxxxxxx>;
>>Zytaruk, Kelly <Kelly.Zytaruk@xxxxxxx>; Yinghai Lu <yinghai@xxxxxxxxxx>
>>Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
>>incapable platform
>>
>>On Mon, May 15, 2017 at 10:53 AM, Alex Williamson
>><alex.williamson@xxxxxxxxxx> wrote:
>>> On Mon, 15 May 2017 08:19:28 +0000
>>> "Cheng, Collins" <Collins.Cheng@xxxxxxx> wrote:
>>>
>>>> Hi Williamson,
>>>>
>>>> We cannot assume BIOS supports SR-IOV, actually only newer server
>>motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or
>>older server motherboard BIOS doesn't support SR-IOV. This issue would
>>happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop
>motherboard.
>>>
>>> Servers should be supporting SR-IOV for a long time now. What really
>>> is there to a BIOS supporting SR-IOV anyway, it's simply reserving
>>> sufficient bus number and MMIO resources such that we can enable the
>>> VFs. This process isn't exclusively reserved for the BIOS. Some
>>> platforms may choose to only initialize boot devices, leaving the
>>> rest for the OS to program. The initial proposal here to disable
>>> SR-IOV if not programmed at OS hand-off disables even the possibility
>>> of the OS reallocating resources for this device.
>>
>>There are differences between supporting SR-IOV and supporting SR-IOV
>>on devices with massive resources. I know I have seen NICs that will
>>keep a system from completing POST if SR-IOV is enabled, and MMIO
>>beyond 4G is not. My guess would be that the issues being seen are
>>probably that they disable SR-IOV in the BIOS in such a setup and end
>>up running into issues when they try to boot into the Linux kernel as
>>it goes through and tries to allocate resources for SR-IOV even though it was
>disabled in the BIOS.
>>
>>It might make sense to add a kernel parameter something like a "pci=nosriov"
>>that would allow for disabling SR-IOV and related resource allocation
>>if that is what we are talking about. That way you could plug in these
>>types of devices into a system with a legacy bios or that doesn't wan
>>to allocate addresses above 32b for MMIO, and this parameter would be
>>all that is needed to disable SR-IOV so you could plug in a NIC that has SR-IOV
>associated with it.
>>
>>>> I agree that failure to allocate VF resources should leave the
>>>> device in no
>>worse condition than before it tried. I hope kernel could allocate PF
>>device resource before allocating VF device resource, and keep PF
>>device resource valid and functional if failed to allocate VF device resource.
>>>>
>>>> I will send out dmesg log lspci info tomorrow. Thanks.
>>>
>>> Thanks,
>>> Alex
>>>
>>>> -----Original Message-----
>>>> From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
>>>> Sent: Friday, May 12, 2017 10:43 PM
>>>> To: Cheng, Collins <Collins.Cheng@xxxxxxx>
>>>> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx;
>>>> linux-kernel@xxxxxxxxxxxxxxx; Deucher, Alexander
>>>> <Alexander.Deucher@xxxxxxx>; Zytaruk, Kelly <Kelly.Zytaruk@xxxxxxx>;
>>>> Yinghai Lu <yinghai@xxxxxxxxxx>
>>>> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
>>>> SR-IOV incapable platform
>>>>
>>>> On Fri, 12 May 2017 04:51:43 +0000
>>>> "Cheng, Collins" <Collins.Cheng@xxxxxxx> wrote:
>>>>
>>>> > Hi Williamson,
>>>> >
>>>> > I verified the patch is working for both AMD SR-IOV GPU and Intel
>>>> > SR-IOV
>>NIC. I don't think it is redundant to check the VF BAR valid before
>>call sriov_init(), it is safe and saving boot time, also there is no a
>>better method to know if system BIOS has correctly initialized the SR-IOV
>capability or not.
>>>>
>>>> It also masks an underlying bug and creates a maintenance issue that
>>>> we won't
>>know when it's safe to remove this workaround. I don't think faster
>>boot is valid rationale, in one case SR-IOV is completely disabled, the
>>other we attempt to allocate the resources the BIOS failed to provide.
>>I expect this is also a corner case, the BIOS should typically support
>>SR-IOV, therefore this situation should be an exception.
>>>>
>>>> > I did not try to fix the issue from the kernel resource allocation
>>>> > perspective,
>>it is because:
>>>> > 1. I am not very familiar with the PCI resource allocation scheme in
>kernel.
>>For example, in sriov_init(), kernel will re-assign the PCI resource
>>for both VF and PF. I don't understand why kernel allocates resource
>>for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.
>>>> > 2. I am not sure if kernel has error handler if PCI resource allocation
>failed.
>>In this case, kernel cannot allocate enough resource to PF. It should
>>trigger some error handler to either just keep original BAR values set
>>by system BIOS, or disable this device and log errors.
>>>>
>>>> I think these are the issues we should be trying to solve and I'm
>>>> sure folks on the linux-pci list can help us identify the bug.
>>>> Minimally, failure to allocate VF resources should leave the device
>>>> in no worse condition than before it tried. Perhaps you could post
>>>> more details about the issue, boot with pci=earlydump, post dmesg of
>>>> a boot where the PF resources are incorrectly re-allocated, and
>>>> include lspci -vvv for the SR-IOV device. Also, please test with
>>>> the latest upstream kernel, upstream only patches old kernels
>>>> through stable backports of commits to the latest kernel. Adding
>>>> Yinghai as a resource allocation expert. Thanks,
>>>>
>>>> Alex
>>>>
>>>> > -----Original Message-----
>>>> > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
>>>> > Sent: Friday, May 12, 2017 12:01 PM
>>>> > To: Cheng, Collins <Collins.Cheng@xxxxxxx>
>>>> > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>;
>>>> > linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Deucher,
>>>> > Alexander <Alexander.Deucher@xxxxxxx>; Zytaruk, Kelly
>>>> > <Kelly.Zytaruk@xxxxxxx>
>>>> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
>>>> > SR-IOV incapable platform
>>>> >
>>>> > On Fri, 12 May 2017 03:42:46 +0000 "Cheng, Collins"
>>>> > <Collins.Cheng@xxxxxxx> wrote:
>>>> >
>>>> > > Hi Williamson,
>>>> > >
>>>> > > GPU card needs more BAR aperture resource than other PCI
>>>> > > devices. For
>>example, Intel SR-IOV network card only require 512KB memory resource
>>for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory
>>resource for frame buffer BAR aperture.
>>>> > >
>>>> > > If the system BIOS supports SR-IOV, it will reserve enough
>>>> > > resource for all
>>VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate
>>the enough resource for VF BARs, only PF BAR will be assigned and VF
>>BARs are empty. Then system boots to Linux kernel and kernel doesn't
>>check if the VF BARs are empty or valid. Kernel will re-assign the BAR
>>resources for PF and all VFs. The problem I saw is that kernel will
>>fail to allocate PF BAR resource because some resources are assigned to
>>VF, this is not expected. So kernel might need to do some check before
>>re-assign the PF/VF resource, so that PF device will be correctly assigned BAR
>resource and user can use PF device.
>>>> >
>>>> > So the problem is that something bad happens when the kernel is
>>>> > trying to reallocate resources in order to fulfill the
>>>> > requirements of the VFs, leaving the PF resources incorrectly
>>>> > programmed? Why not just fix that bug rather than creating
>>>> > special handling for this vendor/class of device which disables
>>>> > any attempt to fixup resources for SR-IOV? IOW, this patch just
>>>> > avoids the problem for your devices rather than fixing the bug.
>>>> > I'd suggest fixing the bug such that the PF is left in a
>>>> > functional state if the kernel is unable to allocate sufficient
>>>> > resources for the VFs. Thanks,
>>>> >
>>>> > Alex
>>>> >
>>>> > > -----Original Message-----
>>>> > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
>>>> > > Sent: Friday, May 12, 2017 11:21 AM
>>>> > > To: Cheng, Collins <Collins.Cheng@xxxxxxx>
>>>> > > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>;
>>>> > > linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>>>> > > Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zytaruk, Kelly
>>>> > > <Kelly.Zytaruk@xxxxxxx>
>>>> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
>>>> > > SR-IOV incapable platform
>>>> > >
>>>> > > On Fri, 12 May 2017 02:50:32 +0000 "Cheng, Collins"
>>>> > > <Collins.Cheng@xxxxxxx> wrote:
>>>> > >
>>>> > > > Hi Helgaas,
>>>> > > >
>>>> > > > Some AMD GPUs have hardware support for graphics SR-IOV.
>>>> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable
>>>> > > > platform. It would cause a problem on PCI resource allocation
>>>> > > > in current Linux kernel.
>>>> > > >
>>>> > > > Therefore in order to allow the PF (Physical Function) device
>>>> > > > of SR-IOV capable GPU to work on the SR-IOV incapable
>>>> > > > platform, it is required to verify conditions for initializing
>>>> > > > BAR resources on AMD SR-IOV capable GPUs.
>>>> > > >
>>>> > > > If the device is an AMD graphics device and it supports SR-IOV
>>>> > > > it will require a large amount of resources.
>>>> > > > Before calling sriov_init() must ensure that the system BIOS
>>>> > > > also supports SR-IOV and that system BIOS has been able to
>>>> > > > allocate enough resources.
>>>> > > > If the VF BARs are zero then the system BIOS does not support
>>>> > > > SR-IOV or it could not allocate the resources and this
>>>> > > > platform will not support AMD graphics SR-IOV.
>>>> > > > Therefore do not call sriov_init().
>>>> > > > If the system BIOS does support SR-IOV then the VF BARs will
>>>> > > > be properly initialized to non-zero values.
>>>> > > >
>>>> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.
>>>> > > >
>>>> > > > I checked the drivers/pci/quirks.c, it looks the
>>>> > > > workarounds/fixes in quirks.c are for specific devices and one
>>>> > > > or more device ID are defined for the specific devices.
>>>> > > > However my patch is for all AMD SR-IOV capable GPUs, that
>>>> > > > includes all existing
>>and future AMD server GPUs.
>>>> > > > So it doesn't seem like a good fit to put the fix in quirks.c.
>>>> > >
>>>> > >
>>>> > > Why is an AMD graphics card unique here? Doesn't sriov_init()
>>>> > > always need to be able to deal with devices of any type where
>>>> > > the BIOS hasn't initialized the SR-IOV capability? Some SR-IOV
>>>> > > devices can fit their VFs within a minimum bridge aperture, most
>>>> > > cannot. I don't understand why the VF resource requirements
>>>> > > being exceptionally large dictates that they receive special handling.
>>>> > > Thanks,
>>>> > >
>>>> > > Alex
>>>> > >
>>>> > > > Signed-off-by: Collins Cheng <collins.cheng@xxxxxxx>
>>>> > > > ---
>>>> > > > drivers/pci/iov.c | 63
>>>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>> > > > 1 file changed, 60 insertions(+), 3 deletions(-)
>>>> > > >
>>>> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
>>>> > > > e30f05c..e4f1405 100644
>>>> > > > --- a/drivers/pci/iov.c
>>>> > > > +++ b/drivers/pci/iov.c
>>>> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct
>>>> > > > pci_dev
>>*dev)
>>>> > > > msleep(100);
>>>> > > > }
>>>> > > >
>>>> > > > +/*
>>>> > > > + * pci_vf_bar_valid - check if VF BARs have resource
>>>> > > > +allocated
>>>> > > > + * @dev: the PCI device
>>>> > > > + * @pos: register offset of SR-IOV capability in PCI config
>>>> > > > +space
>>>> > > > + * Returns true any VF BAR has resource allocated, false
>>>> > > > + * if all VF BARs are empty.
>>>> > > > + */
>>>> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
>>>> > > > + int i;
>>>> > > > + u32 bar_value;
>>>> > > > + u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
>>>> > > > + PCI_BASE_ADDRESS_MEM_TYPE_64 |
>>>> > > > + PCI_BASE_ADDRESS_MEM_PREFETCH);
>>>> > > > +
>>>> > > > + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>>> > > > + pci_read_config_dword(dev, pos + PCI_SRIOV_BAR
>>>> > > > + + i * 4,
>>&bar_value);
>>>> > > > + if (bar_value & bar_size_mask)
>>>> > > > + return true;
>>>> > > > + }
>>>> > > > +
>>>> > > > + return false;
>>>> > > > +}
>>>> > > > +
>>>> > > > +/*
>>>> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU
>>>> > > > +device
>>>> > > > + * @dev: the PCI device
>>>> > > > + *
>>>> > > > + * Returns true if device is an AMD/ATI display adapter,
>>>> > > > + * otherwise return false.
>>>> > > > + */
>>>> > > > +
>>>> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {
>>>> > > > + return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
>>>> > > > + (dev->vendor == PCI_VENDOR_ID_ATI ||
>>>> > > > + dev->vendor == PCI_VENDOR_ID_AMD)); }
>>>> > > > +
>>>> > > > /**
>>>> > > > * pci_iov_init - initialize the IOV capability
>>>> > > > * @dev: the PCI device
>>>> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>>>> > > > return -ENODEV;
>>>> > > >
>>>> > > > pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
>>>> > > > - if (pos)
>>>> > > > - return sriov_init(dev, pos);
>>>> > > > -
>>>> > > > + if (pos) {
>>>> > > > + /*
>>>> > > > + * If the device is an AMD graphics device and it supports
>>>> > > > + * SR-IOV it will require a large amount of resources.
>>>> > > > + * Before calling sriov_init() must ensure that the system
>>>> > > > + * BIOS also supports SR-IOV and that system BIOS has been
>>>> > > > + * able to allocate enough resources.
>>>> > > > + * If the VF BARs are zero then the system BIOS does not
>>>> > > > + * support SR-IOV or it could not allocate the resources
>>>> > > > + * and this platform will not support AMD graphics SR-IOV.
>>>> > > > + * Therefore do not call sriov_init().
>>>> > > > + * If the system BIOS does support SR-IOV then the VF BARs
>>>> > > > + * will be properly initialized to non-zero values.
>>>> > > > + */
>>>> > > > + if (is_amd_display_adapter(dev)) {
>>>> > > > + if (pci_vf_bar_valid(dev, pos))
>>>> > > > + return sriov_init(dev, pos);
>>>> > > > + } else {
>>>> > > > + return sriov_init(dev, pos);
>>>> > > > + }
>>>> > > > + }
>>>> > > > return -ENODEV;
>>>> > > > }
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>