Re: [PATCH] PCI: pciehp: Do not turn off slot if presence comes up after link

From: Alex_Gagniuc
Date: Mon Feb 11 2019 - 18:48:43 EST


On 2/9/19 5:58 AM, Lukas Wunner wrote:
>
> [EXTERNAL EMAIL]
>
> On Tue, Feb 05, 2019 at 03:06:56PM -0600, Alexandru Gagniuc wrote:
>> According to PCIe 3.0, the presence detect state is a logical OR of
>> in-band and out-of-band presence.
>
> Since Bjorn asked for a spec reference:
>
> PCIe r4.0 sec 7.8.11: "Presence Detect State - This bit indicates the
> presence of an adapter in the slot, reflected by the logical OR of the
> Physical Layer in-band presence detect mechanism and, if present, any
> out-of-band presence detect mechanism defined for the slot's
> corresponding form factor."

ECN [1] was approved last November, so it's normative spec text. Sorry
if the Ukranians didn't get ahold of it yet. I'll try to explain the delta.
There's an in-band PD supported/disable set of bits. If 'supported' is
set, then you can set 'disable', which removes the 'logical OR" and now
PD is only out-of-band presence.

> Table 4-14 in sec 4.2.6 is also important because it shows the difference
> between Link Up and in-band presence.

So, we'd expect PD to come up at the same time or before DLL?

>> With this, we'd expect the presence
>> state to always be asserted when the link comes up.
>>
>> Not all hardware follows this, and it is possible for the presence to
>> come up after the link. In this case, the PCIe device would be
>> erroneously disabled and re-probed. It is possible to distinguish
>> between a delayed presence and a card swap by looking at the DLL state
>> changed bit -- The link has to come down if the card is removed.
>
> So in reality, PDC and DLLSC events rarely come in simultaneously and
> we do allow some leeway in pcie_wait_for_link(), it's just not enough
> in your particular case due to the way presence is detected by the
> platform.
>
> I think in this case instead of changing the behavior for everyone,
> it's more appropriate to add a quirk for your hardware. Two ideas:
>
> * Amend pcie_init() to set slot_cap |= PCI_EXP_SLTCAP_ABP. Insert
> this as a quirk right below the existing Thunderbolt quirk and
> use PCI vendor/device IDs and/or DMI checks to identify your
> particular hardware. If the ABP bit is set, PDC events are not
> enabled by pcie_enable_notification(), so you will solely rely on
> DLLSC events. Problem solved. (Hopefully.)
>
> * Alternatively, add a DECLARE_PCI_FIXUP_FINAL() quirk which sets
> pdev->link_active_reporting = false. This causes pcie_wait_for_link()
> to wait 1100 msec before the hot-added device's config space is
> accessed for the first time.

These are both hacks right? We're declaring something untrue about the
hardware because we want to obtain a specific side-effect. BUt the
side-effects are not guaranteed not to change.

> Would this work for you?

I'm certain it's doable. In theory one could use the PCI ID of the DP,
and the vendor and machine names to filter the quirk. But what happens
in situations where the same PCI ID switch is used in different parts of
the system? You want the quirk here or not there. It quickly becomes a
shartstorm.

I'd like a generic solution. I suspect supporting 'in-band PD disabled'
and quirking for that would be a saner way to do things. If we support
it, then we need to handle PDSC after DLLSC scenarios anyway.

Alex

[1] PCI-SIG ENGINEERING CHANGE NOTICE
Async Hot-Plug Updates
Nov 29, 2018