Re: [PATCH v4 2/2] PCI: Enable NO_BUS_RESET quirk for Nvidia GPUs

From: Shanker R Donthineni
Date: Wed May 05 2021 - 11:36:22 EST


Hi Pali,

On 5/5/21 7:15 AM, Pali Rohár wrote:
> Hello! If I understood this "reset" issue correctly, it means that
> affected PCIe GPU device cannot be reset via PCI Secondary Bus Reset
> (PCIe Warm Reset) and some special, platform specific reset type needs
> to be issued.
>
> And code for this platform specific reset is included in ACPI DSDT
> table.
Yes, correct.
> But because ACPI DSDT table is part of BIOS/firmware and not part of the
> PCIe GPU device itself, it means that this kind of reset is available to
> linux kernel only in the case when vendor of motherboard (or who burn
> BIOS/firmware into motherboard EEPROM) includes this specific code into
> HW. Am I Right?
ACPI specification provides a standard mechanism for a function level reset
using _RST method and should work for any OSPM not just Linux.

https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/resetting-and-recovering-a-device
ACPI firmware: Function-level reset
To support function-level device reset, there must be an _RST method defined inside the Device scope. If present, this method will override the bus driver's implementation of function-level device reset (if present) for that device. When executed, the _RST method must reset only that device, and must not affect other devices. In addition, the device must stay connected on the bus.
> So if this PCIe GPU device is connected to other motherboard or other
> system then this special platform reset in ACPI DSDT is not available.
PCI hw resets won't work. only way to reset the device using platform specific code.
> What is doing default APCI _RST() method on motherboards without this
> special platform reset hook? It probably would not be able to reset
> these PCIe GPU devices if standard SBR cannot reset them.
Yes, BIOS/firmware has to support where these affected  GPU devices are attached.
These GPU devices are not plug-in PCIe cards, only exist on server baseboards and
directly attached to PCIe fabric. 
> Would not be better to include for these PCIe devices "native" linux
> code for resetting them?
It requires complicated code sequence and has to access many platform specific
registers. We're taking advantage of OS independent standard ACPI-RST reset
mechanism for resting the GPU device.
> Please correct me if I'm wrong in my assumption or if I understood this
> issue incorrectly.
The GPU has side effects after triggering the SBR, it requires the system reboot to
bring the device back to the operating state, This workaround is to prevent SBR.