Re: [PATCH v8 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
From: Ding Tianhong
Date: Sat Aug 05 2017 - 02:42:12 EST
On 2017/8/5 5:06, Casey Leedom wrote:
> | From: Ding Tianhong <dingtianhong@xxxxxxxxxx>
> | Sent: Thursday, August 3, 2017 6:44 AM
> |
> | diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> | index 6967c6b..1e1cdbe 100644
> | --- a/drivers/pci/quirks.c
> | +++ b/drivers/pci/quirks.c
> | @@ -4016,6 +4016,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
> | quirk_tw686x_class);
> |
> | /*
> | + * Some devices have problems with Transaction Layer Packets with the Relaxed
> | + * Ordering Attribute set. Such devices should mark themselves and other
> | + * Device Drivers should check before sending TLPs with RO set.
> | + */
> | +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> | +{
> | + dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> | +}
> | +
> | +/*
> | + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> | + * cause performance problems with Upstream Transaction Layer Packets with
> | + * Relaxed Ordering set.
> | + */
> | +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> | + quirk_relaxedordering_disable);
> | +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> | + quirk_relaxedordering_disable);
> | +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> | + quirk_relaxedordering_disable);
> | + ...
>
> It looks like this is missing the set of Root Complex IDs that were noted in
> the document to which Patrick Cramer sent us a reference:
>
> https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf
>
> In section 3.9.1 we have:
>
> 3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
> and Toward MMIO Regions (P2P)
>
> In order to maximize performance for PCIe devices in the processors
> listed in Table 3-6 below, the soft- ware should determine whether the
> accesses are toward coherent memory (system memory) or toward MMIO
> regions (P2P access to other devices). If the access is toward MMIO
> region, then software can command HW to set the RO bit in the TLP
> header, as this would allow hardware to achieve maximum throughput for
> these types of accesses. For accesses toward coherent memory, software
> can command HW to clear the RO bit in the TLP header (no RO), as this
> would allow hardware to achieve maximum throughput for these types of
> accesses.
>
> Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
> PCIe Performance
>
> Processor CPU RP Device IDs
>
> Intel Xeon processors based on 6F01H-6F0EH
> Broadwell microarchitecture
>
> Intel Xeon processors based on 2F01H-2F0EH
> Haswell microarchitecture
>
> The PCI Device IDs you have there are the first ones that I guessed at
> having the performance problem with Relaxed Ordering. We now apparently
> have a complete list from Intel.
>
> I don't want to phrase this as a "NAK" because you've gone around the
> mulberry bush a bunch of times already. So maybe just go with what you've
> got in version 8 of your patch and then do a follow on patch to complete the
> table?
>
Casey:
Thanks for the good catch, I found that the Ashok has notice this 3 month before, I am so sorry to
miss it, it was really a long discussion for this problem, but don't worry, It is not a big work to fix it,
I will send the v9 version. :)
Ding
> Casey
> .
>