Re: [RFC PATCH 1/6] PCI/RCEC: Introduce pcie_walk_rcec_all()

From: Li, Ming
Date: Mon Apr 22 2024 - 22:33:48 EST


On 4/23/2024 7:03 AM, Dan Williams wrote:
> Terry Bowman wrote:
> [..]
>>> Hi Terry,
>>>
>>> This patchset is responding to the implications of the implementation
>>> note in 9.18.1.5 RCEC Downstream Port Association Structure (RDPAS).
>>> That says that CXL.io and CXL.cachemem errors in Root Ports may indeed
>>> be signaled to an RCEC. Do you expect that implementation note to cause
>>> any issues on platforms that do not follow that CXL spec behavior?
>>>
>>> My expectation is that it may just cause extra polling for errors, but
>>> not cause any harm.
>>
>> AMD platforms in RCH/RCD mode consume protocol errors in the RCEC's AER driver. AMD
>> platforms in VH mode consume protocol errors (including root port errors) in the
>> root port's AER driver. The exception is the VH mode host with CXL1.1 endpoint and
>> RCH downstream errors. CXL1.1 endpoint and RCH downstream errors in a VH host are
>> consumed in the RCEC.
>
> I agree that's the most compatible path for existing software.
>
>> I don't believe these patchset changes would affect this behavior. But, I will need
>> to test to confirm.
>
> As I wrote to Li Ming, I think any potential conflict can further be
> limited by the fact that this extra scanning is limited to CXL.cachemem,
> not typical PCI AER flows.

Agree with Dan, but I think that software does not have a chance to know if the error is a CXL.cachemem error withour RDPAS(only knows it is a uncor_internal_error/cor_internal_error reported by RCEC), maybe we can limit this extra scanning for the RPs working on CXL mode?