Re: [RFC PATCH 3/6] PCI/AER: Enable RCEC to report internal error for CXL root port

From: Dan Williams
Date: Mon Apr 22 2024 - 19:10:09 EST


Li, Ming wrote:
> On 4/18/2024 10:57 PM, Dan Williams wrote:
> > Li, Ming wrote:
> >> On 4/16/2024 10:46 PM, Terry Bowman wrote:
> >>> The driver support is much simpler if RCEC does not handle VH protocol errors. Is there
> >>> a reason to forward root port VH mode protocol errors to an RCEC rather than consume
> >>> in the root port's AER driver and forward to CXL error handler?
> >>>
> >> I agree that is simpler if only root port handle VH protocol errors,
> >> but I think that software has no chance to choose if VH protocol
> >> errors reported to RCEC or root port, it depends on platform
> >> implementation. So I think we should support both cases.
> >
> > The question is whether the CXL spec RDPAS behavior causes any problems
> > for platforms that follow PCIe rather than CXL reporting flows for
> > root-port errors. I.e. does it cause problems if Linux starts scanning
> > root ports on RCEC notifications?
> >
> > I do think the lookup needs to change to be based on CXL host-bridge
> > detection and not CXL-type-3 endpoint detection, but otherwise it looks
> > like CXL spec wants to invalidate PCIe spec expectations.
>
> Hi Dan, if my understanding is correct, the CXL host-bridge detection
> you mentioned is that iterating all root ports under RCEC associated
> bus range for RCEC reported VH protocol errors case, and the
> CXL-type-3 detection is that iterating all CXL-type-3 endpoint under
> RCEC associated bus range. is it right?

I think this error checking needs to be tightly scoped to only scan for
CXL.cachemem errors and not CXL.io or typical PCIe errors. That way we
are not technically running afoul of the PCIe expectations that *PCIe*
root-port errors are only reported by their local AER block and not an
RCEC.

So the scanning should be limited to just the root-ports that have
negotiated a CXL.cachemem link.