Re: [PATCH v2 2/2] PCI: pciehp: Clear the optional capabilities in DEVCTL2 on a hot-plug
From: Lukas Wunner
Date: Fri Jun 16 2023 - 14:24:17 EST
Hi Smita,
My apologies for the delay!
On Mon, May 22, 2023 at 03:23:31PM -0700, Smita Koralahalli wrote:
> On 5/11/2023 4:19 AM, Lukas Wunner wrote:
> > On Tue, Apr 18, 2023 at 09:05:26PM +0000, Smita Koralahalli wrote:
> > > Clear all capabilities in Device Control 2 register as they are optional
> > > and it is not determined whether the next device inserted will support
> > > these capabilities. Moreover, Section 6.13 of the PCIe Base
> > > Specification [1], recommends clearing the ARI Forwarding Enable bit on
> > > a hot-plug event as its not guaranteed that the newly added component
> > > is in fact an ARI device.
> >
> > Clearing ARI Forwarding Enable sounds reasonable, but I don't see
> > why all the other bits in the Device Control 2 register need to be
> > cleared. If there isn't a reason to clear them, I'd be in favor of
> > leaving them alone.
>
> I understand. The SPEC doesn't "clearly" state to clear the other bits
> except ARI on a hot-plug event.
>
> But, we came across issues when a device with 10-bit tags was removed and
> replaced with a device that didn't support 10-bit tags. The device became
> inaccessible and the port was not able to be recovered without a system
> reset. So, we thought it would be better to cherry pick all bits that were
> negotiated between endpoint and root port and decided that we should clear
> them all on removal. Hence, my first revision of this patch series had aimed
> to clear only ARI, AtomicOp Requestor and 10 bit tags as these were the
> negotiated settings between endpoint and root port. Ideally, these settings
> should be re-negotiated and set up for optimal operation on a hot add.
Makes total sense. I like the approach of clearing only these three
bits, as you did in v1 of the patch. I also appreciate the detailed
explanation that you've provided. Much of your e-mail can be copy-pasted
to the commit message, in my opinion it's valuable information to any
reviewer and future reader of the commit.
> We had some internal discussions to understand if SPEC has it documented
> somewhere. And we could see in Section 2.2.6.2, it implies that:
> [i] If a Requester sends a 10-Bit Tag Request to a Completer that lacks
> 10-Bit Completer capability, the returned Completion(s) will have Tags with
> Tag[9:8] equal to 00b. Since the Requester is forbidden to generate these
> Tag values for 10-Bit Tags, such Completions will be handled as Unexpected
> Completions, which by default are Advisory Non-Fatal Errors. The Requester
> must follow standard PCI Express error handling requirements.
> [ii] In configurations where a Requester with 10-Bit Tag Requester
> capability needs to target multiple Completers, one needs to ensure that the
> Requester sends 10-Bit Tag Requests only to Completers that have 10-Bit Tag
> Completer capability.
>
> Now, we might wonder, why clear them (especially 10-bit tags and AtomicOps)
> if Linux hasn't enabled them at all as the "10-Bit Tag Requester Enable" bit
> is not defined in Linux currently. But, these features might be enabled by
> Platform FW for "performance reasons" if the endpoint supports and now it is
> the responsibility of the operating system to disable it on a hot remove
> event.
Again, makes total sense.
> According to implementation notes in 2.2.6.2: "For platforms where the RC
> supports 10-Bit Tag Completer capability, it is highly recommended for
> platform firmware or operating software that configures PCIe hierarchies to
> Set the 10-Bit Tag Requester Enable bit automatically in Endpoints with
> 10-Bit Tag Requester capability. This enables the important class of 10-Bit
> Tag capable adapters that send Memory Read Requests only to host memory." So
> if the endpoint and root port both support 10-bit tags BIOS is enabling it
> at boot time..
>
> I ran a quick check to see how DEV_CTL2 registers for root port look on a
> 10-bit tag supported NVMe device.
>
> pci 0000:80:05.1: DEVCTL2 0x1726 (Bit 12: 10-bit tag is enabled..)
> pci 0000:80:05.1: DEVCAP2 0x7f19ff
>
> So, it seems like BIOS has enabled 10-bit tags at boot time even though
> Linux hasn't enabled it.
>
> Some couple of ways we think could be:
> [1] Check if these bits are enabled by Platform at boot time, clear them
> only it is set during hotplug flow.
> [2] Clear them unconditionally as I did..
> [3] Enable 10-bits tags in Linux when a device is probed just like how we do
> for ARI..
>
> Similarly call pci_enable_atomic_ops_to_root() during a hot add..
Personally I'm fine with option [2]. If you or Bjorn prefer option [3],
I'm fine with that as well.
> > As for clearing ARI Forwarding Enable, it seems commit b0cc6020e1cc
> > ("PCI: Enable ARI if dev and upstream bridge support it; disable
> > otherwise") already solved this problem. Quoth its commit message:
[...]
> > My superficial understanding of that patch is that we do find function 0,
> > while enumerating it we clear the ARI Forwarding Enable bit and thus the
> > remaining functions become accessible and are subsequently enumerated.
> >
> > Are you seeing issues when removing an ARI-capable endpoint from a
> > hotplug slot and replacing it with a non-ARI-capable device?
> > If you do, the question is why the above-quoted commit doesn't avoid them.
>
> Yeah! Sorry I missed this. ARI is already checked and enabled during device
> initialization.
It doesn't hurt to additionally clear on hot-removal.
Thanks,
Lukas