Re: [PATCH v16 08/10] cxl: Update Endpoint AER uncorrectable handler
From: Bowman, Terry
Date: Wed Mar 11 2026 - 12:00:57 EST
On 3/9/2026 9:12 AM, Jonathan Cameron wrote:
> On Mon, 2 Mar 2026 14:36:46 -0600
> Terry Bowman <terry.bowman@xxxxxxx> wrote:
>
>> CXL drivers now implement protocol RAS support. PCI protocol errors,
>> however, continue to be reported via the AER capability and must still be
>> handled by a PCI error recovery callback.
>>
>> Replace the existing cxl_error_detected() callback in cxl/pci.c with a
>> new cxl_pci_error_detected() implementation that handles uncorrectable
>> AER PCI protocol errors. Changes for PCI Correctable protocol errors will
>> be added in a future patch.
>>
>> Introduce function cxl_uncor_aer_present() to handle and log the CXL
>> Endpoint's AER errors. Endpoint fatal AER errors are not currently logged by
>> the AER driver and require logging here with a call to pci_print_aer().
>>
>> This cleanly separates CXL protocol error handling from PCI AER handling
>> and ensures that each subsystem processes only the errors it is
>> responsible.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
>> Assisted-by: Azure:gpt4.1-nano-key
> One question inline.
>
>>
>> ---
>>
>> Changes in v15->v16:
>> - Update commit message (DaveJ)
>> - s/cxl_handle_aer()/cxl_uncor_aer_present()/g (Jonathan)
>> - cxl_uncor_aer_present(): Leave original result calculation based on
>> if a UCE is present and the provided state (Terry)
>> - Add call to pci_print_aer(). AER fails to log because is upstream
>> link (Terry)
>>
>> Changes in v14->v15:
>> - Update commit message and title. Added Bjorn's ack.
>> - Move CE and UCE handling logic here
>>
>> Changes in v13->v14:
>> - Add Dave Jiang's review-by
>> - Update commit message & headline (Bjorn)
>> - Refactor cxl_port_error_detected()/cxl_port_cor_error_detected() to
>> one line (Jonathan)
>> - Remove cxl_walk_port() (Dan)
>> - Remove cxl_pci_drv_bound(). Check for 'is_cxl' parent port is
>> sufficient (Dan)
>> - Remove device_lock_if()
>> - Combined CE and UCE here (Terry)
>>
>> Changes in v12->v13:
>> - Move get_pci_cxl_host_dev() and cxl_handle_proto_error() to Dequeue
>> patch (Terry)
>> - Remove EP case in cxl_get_ras_base(), not used. (Terry)
>> - Remove check for dport->dport_dev (Dave)
>> - Remove whitespace (Terry)
>>
>> Changes in v11->v12:
>> - Add call to cxl_pci_drv_bound() in cxl_handle_proto_error() and
>> pci_to_cxl_dev()
>> - Change cxl_error_detected() -> cxl_cor_error_detected()
>> - Remove NULL variable assignments
>> - Replace bus_find_device() with find_cxl_port_by_uport() for upstream
>> port searches.
>>
>> Changes in v10->v11:
>> - None
>> ---
>> drivers/cxl/core/ras.c | 57 ++++++++++++++++++++++++------------------
>> drivers/cxl/cxlpci.h | 9 +++----
>> drivers/cxl/pci.c | 6 ++---
>> 3 files changed, 39 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index 254144d19764..884e40c66638 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
> ...
>
>
>> +pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
>> + pci_channel_state_t state)
>> +{
>> + bool ue = cxl_uncor_aer_present(pdev);
>> + struct cxl_port *port = get_cxl_port(pdev);
>
> This got a reference that wasn't (I think) previously taken.
> I'm not spotting where that is released. It it is somewhere beyond
> this function, good to add a comment saying where.
>
>
This should be using the scope cleanup. I will change. Thanks.
-Terry