Re: [PATCH V3] PCI: pciehp: Disable ACS Source Validation during hot-remove

From: Dan Williams
Date: Tue Aug 01 2023 - 20:19:55 EST


[ add linux-cxl ]

Hi Vidya, Lukas highlighted this thread to me as we, in linux-cxl land,
are also seeing conflicts between ACS source validation and flows like
CXL PM.

Lukas Wunner wrote:
> On Mon, Jul 31, 2023 at 01:32:27AM +0530, Vidya Sagar wrote:
> > On 7/31/2023 1:10 AM, Lukas Wunner wrote:
> > > On Mon, Jul 31, 2023 at 12:45:19AM +0530, Vidya Sagar wrote:
> > > > PCIe 6.0, 6.12.1.1 specifies that downstream devices are permitted to
> > > > send upstream messages before they have been assigned a bus number and
> > > > such messages have a Requester ID with Bus number set to 00h.
> > > > If the Downstream port has ACS Source Validation enabled, these messages
> > > > will be detected as ACS violation error.
> > > >
> > > > Hence, disable ACS Source Validation in the bridge device during
> > > > hot-remove operation and re-enable it after enumeration of the
> > > > downstream hierarchy but before binding the respective device drivers.
> > >
> > > What are these messages that are sent before assignment of a bus number?
> >
> > One example is the DRS (Device Readiness Status) message.
>
> Please mention that in the commit message.
>
>
> > > What's the user-visible issue that occurs when they're blocked?
> >
> > I'm not sure about the issue one can observe when they are blocked, but, we
> > have seen one issue when they are not blocked. When an endpoint sends a DRS
> > message and an ACS violation is raised for it, the system can trigger DPC
> > (Downstream Port Containment) if it is configured to do so for ACS
> > violations. Once the DPC is released after handling it, system would go for
> > link-up again, which results in root port receiving DRS once again from the
> > endpoint and the cycle continues.
>
> As an alternative to disabling ACS, have you explored masking ACS
> Violations (PCI_ERR_UNC_ACSV) upon de-enumeration of a device and
> unmasking them after assignment of a bus number?

The problem is that still prevents things like CXL PM negotiation from
completing.

The conflict for CXL PM can hopefully be fixed in the spec and future
devices, but that is at least a full generation of CXL devices that will
fail to handle hotplug and secondary-bus resets.

One proposal I had for this was to enforce that the Downstream Port
disables bus-master-enable and enforces P2P to redirect upstream when
source validation is turned off. Then, when the device re-establishes
the link and is re-enabled source-validation can be turned back on
before downstream-port bus-master enable is set so that there is no
window to launch memory-cycle attacks while source-validation is turned
off.

Is that something you would be willing to investigate for the next round
of this patch Vidya?