Re: [PATCH v16 10/10] cxl: Enable CXL protocol error reporting
From: Dan Williams
Date: Tue Mar 31 2026 - 15:17:47 EST
Bowman, Terry wrote:
> On 3/29/2026 8:41 PM, Dan Williams wrote:
> > Terry Bowman wrote:
> >> CXL protocol errors are not enabled for all CXL devices after boot. These
> >> must be enabled inorder to process CXL protocol errors.
> >>
> >> Introduce cxl_unmask_proto_interrupts() to call pci_aer_unmask_internal_errors().
> >> pci_aer_unmask_internal_errors() expects the pdev->aer_cap is initialized.
> >> But, dev->aer_cap is not initialized for CXL Upstream Switch Ports and CXL
> >> Downstream Switch Ports. Initialize the dev->aer_cap if necessary. Enable AER
> >> correctable internal errors and uncorrectable internal errors for all CXL
> >> devices.
> >>
> >> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> >> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> >> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> >> Reviewed-by: Ben Cheatham <benjamin.cheatham@xxxxxxx>
> >>
> >> ---
> >> drivers/cxl/core/port.c | 2 ++
> >> drivers/cxl/core/ras.c | 22 ++++++++++++++++++++++
> >> drivers/cxl/cxlpci.h | 4 ++++
> >> 3 files changed, 28 insertions(+)
> >>
> >> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> >> index 27271402915f..c33d58fb7264 100644
> >> --- a/drivers/cxl/core/port.c
> >> +++ b/drivers/cxl/core/port.c
> >> @@ -1852,6 +1852,8 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> >>
> >> rc = cxl_add_ep(dport, &cxlmd->dev);
> >>
> >> + cxl_unmask_proto_interrupts(cxlmd->cxlds->dev);
> >> +
> >
> > Why here? devm_cxl_port_ras_setup() will just redo it, right?
>
> No, I found this change is needed otherwise injection fails.
Sounds like something worth fixing rather than sprinkling an out of
place workaround. Port resource acquisition should stay logically
grouped. I am also missing where this masking is restored on exit?
I have asked about your test scripts in the past [1]. Those scripts need
to be integrated into cxl_test proper, or at least contributed on the
side such that anyone can clone the tests and run them. It needs to be
the case that incremental refactoring work can move with confidence by
simply running the tests.
[1]: http://lore.kernel.org/68815a66459e4_134cc710012@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch