Re: [PATCH v16 10/10] cxl: Enable CXL protocol error reporting

From: Bowman, Terry

Date: Tue Mar 31 2026 - 17:18:13 EST


On 3/31/2026 2:16 PM, Dan Williams wrote:
> Bowman, Terry wrote:
>> On 3/29/2026 8:41 PM, Dan Williams wrote:
>>> Terry Bowman wrote:
>>>> CXL protocol errors are not enabled for all CXL devices after boot. These
>>>> must be enabled inorder to process CXL protocol errors.
>>>>
>>>> Introduce cxl_unmask_proto_interrupts() to call pci_aer_unmask_internal_errors().
>>>> pci_aer_unmask_internal_errors() expects the pdev->aer_cap is initialized.
>>>> But, dev->aer_cap is not initialized for CXL Upstream Switch Ports and CXL
>>>> Downstream Switch Ports. Initialize the dev->aer_cap if necessary. Enable AER
>>>> correctable internal errors and uncorrectable internal errors for all CXL
>>>> devices.
>>>>
>>>> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
>>>> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
>>>> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
>>>> Reviewed-by: Ben Cheatham <benjamin.cheatham@xxxxxxx>
>>>>
>>>> ---
>>>> drivers/cxl/core/port.c | 2 ++
>>>> drivers/cxl/core/ras.c | 22 ++++++++++++++++++++++
>>>> drivers/cxl/cxlpci.h | 4 ++++
>>>> 3 files changed, 28 insertions(+)
>>>>
>>>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>>>> index 27271402915f..c33d58fb7264 100644
>>>> --- a/drivers/cxl/core/port.c
>>>> +++ b/drivers/cxl/core/port.c
>>>> @@ -1852,6 +1852,8 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
>>>>
>>>> rc = cxl_add_ep(dport, &cxlmd->dev);
>>>>
>>>> + cxl_unmask_proto_interrupts(cxlmd->cxlds->dev);
>>>> +
>>>
>>> Why here? devm_cxl_port_ras_setup() will just redo it, right?
>>
>> No, I found this change is needed otherwise injection fails.
>
> Sounds like something worth fixing rather than sprinkling an out of
> place workaround. Port resource acquisition should stay logically
> grouped. I am also missing where this masking is restored on exit?
>
> I have asked about your test scripts in the past [1]. Those scripts need
> to be integrated into cxl_test proper, or at least contributed on the
> side such that anyone can clone the tests and run them. It needs to be
> the case that incremental refactoring work can move with confidence by
> simply running the tests.
>
> [1]: http://lore.kernel.org/68815a66459e4_134cc710012@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch

Hi Dan,

I sent a tgz in response here but the Intel server blocked it. I emailed it to
you and DaveJ.

- Terry