Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

From: Bjorn Helgaas
Date: Tue Aug 08 2017 - 22:22:52 EST


On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> When bit4 is set in the PCIe Device Control register, it indicates
> whether the device is permitted to use relaxed ordering.
> On some platforms using relaxed ordering can have performance issues or
> due to erratum can cause data-corruption. In such cases devices must avoid
> using relaxed ordering.
>
> This patch checks if there is any node in the hierarchy that indicates that
> using relaxed ordering is not safe.

I think you only check the devices between the root port and the
target device. For example, you don't check siblings or cousins of
the target device.

> In such cases the patch turns off the
> relaxed ordering by clearing the eapability for this device.

s/eapability/capability/

> And if the
> device is probably running in a guest machine, we should do nothing.

I don't know what this sentence means. "Probably running in a guest
machine" doesn't really make sense, and there's nothing in your patch
that explicitly checks for being in a guest machine.

> Signed-off-by: Ding Tianhong <dingtianhong@xxxxxxxxxx>
> Acked-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxx>
> Acked-by: Ashok Raj <ashok.raj@xxxxxxxxx>
> ---
> drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++
> drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
> include/linux/pci.h | 2 ++
> 3 files changed, 68 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index af0cc34..4f9d7c1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
> EXPORT_SYMBOL(pcie_set_mps);
>
> /**
> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
> + * @dev: PCI device to query
> + *
> + * If possible clear relaxed ordering

Why "If possible"? The bit is required to be RW or hardwired to zero,
so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns.

> + */
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
> +{
> + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
> + PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);

The current series doesn't add any callers of this except
pci_configure_relaxed_ordering(), so it doesn't need to be exported to
modules.

I think I would put both of these functions in drivers/pci/probe.c.
Then this one could be static and you'd only have to add
pcie_relaxed_ordering_supported() to include/linux/pci.h.

> +
> +/**
> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support

s/relexed/relaxed/

> + * @dev: PCI device to query
> + *
> + * Returns true if the device support relaxed ordering attribute.
> + */
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
> +{
> + u16 v;
> +
> + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
> +
> + return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);

This is misnamed. This doesn't tell us anything about whether the
device *supports* relaxed ordering. It only tells us whether the
device is *permitted* to use it.

When a device initiates a transaction, the hardware should set the RO
bit in the TLP with logic something like this:

RO = <this Function supports relaxed ordering> &&
<this transaction doesn't require strong write ordering> &&
<PCI_EXP_DEVCTL_RELAX_EN is set>

The issue you're fixing is that some Completers don't handle RO
correctly. The determining factor is not the Requester, but the
Completer (for this series, a Root Port). So I think this should be
something like:

int pcie_relaxed_ordering_broken(struct pci_dev *completer)
{
if (!completer)
return 0;

return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
}

and the caller should do something like this:

if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
adapter->flags |= ROOT_NO_RELAXED_ORDERING;

That way it's obvious where the issue is, and it's obvious that the
answer might be different for peer-to-peer transactions than it is for
transactions to the root port, i.e., to coherent memory.

> +
> +/**
> * pcie_get_minimum_link - determine minimum link settings of a PCI device
> * @dev: PCI device to query
> * @speed: storage for minimum speed
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index c31310d..48df012 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
> PCI_EXP_DEVCTL_EXT_TAG);
> }
>
> +/**
> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device
> + * should disable the relaxed ordering attribute.
> + * @dev: PCI device
> + *
> + * Return true if any of the PCI devices above us do not support
> + * relaxed ordering.
> + */
> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
> +{
> + while (dev) {
> + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
> + return true;
> +
> + dev = dev->bus->self;
> + }
> +
> + return false;
> +}
> +
> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
> +{
> + /* We should not alter the relaxed ordering bit for the VF */
> + if (dev->is_virtfn)
> + return;
> +
> + /* If the releaxed ordering enable bit is not set, do nothing. */

s/releaxed/relaxed/

> + if (!pcie_relaxed_ordering_supported(dev))
> + return;
> +
> + if (pci_dev_should_disable_relaxed_ordering(dev)) {
> + pcie_clear_relaxed_ordering(dev);
> + dev_info(&dev->dev, "Disable Relaxed Ordering\n");

This associates the message with the Requester that may potentially
use relaxed ordering. But there's nothing wrong or unusual about the
Requester; the issue is with the *Completer*, so I think the message
should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
Maybe it should be both places; I dunno.

This implementation assumes the device only initiates transactions to
coherent memory, i.e., it assumes the device never does peer-to-peer
DMA. I guess we'll have to wait and see if we trip over any
peer-to-peer issues, then figure out how to handle them.

> + }
> +}
> +
> static void pci_configure_device(struct pci_dev *dev)
> {
> struct hotplug_params hpp;
> @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
>
> pci_configure_mps(dev);
> pci_configure_extended_tags(dev);
> + pci_configure_relaxed_ordering(dev);
>
> memset(&hpp, 0, sizeof(hpp));
> ret = pci_get_hp_params(dev, &hpp);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 412ec1c..3aa23a2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
> void pci_pme_wakeup_bus(struct pci_bus *bus);
> void pci_d3cold_enable(struct pci_dev *dev);
> void pci_d3cold_disable(struct pci_dev *dev);
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev);
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
>
> /* PCI Virtual Channel */
> int pci_save_vc_state(struct pci_dev *dev);
> --
> 1.8.3.1
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel