Re: [RFC PATCH 9/9] cxl/pci: Enable interrupts for CXL PCIe ports' AER internal errors

From: Terry Bowman
Date: Mon Jun 24 2024 - 12:46:16 EST


Hi Jonathan,

I added responses inline below.

On 6/20/24 08:15, Jonathan Cameron wrote:
> On Mon, 17 Jun 2024 15:04:11 -0500
> Terry Bowman <terry.bowman@xxxxxxx> wrote:
>
>> CXL RAS errors are reported through AER interrupts using the AER status:
>> correctbale internal errors (CIE) and AER uncorrectable internal errors
>
> correctable
>

Thanks.

>> (UIE).[1] But, the AER CIE/UIE are disabled by default preventing
>> notification of CXL RAS errors.[2]
>>
>> Enable CXL PCIe port RAS notification by unmasking the ports' AER CIE
>> and UIE errors.
>>
>> [1] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and Upstream
>> Switch Ports
>> [2] PCI6.0 - 7.8.4.3 Uncorrectable Error Mask Register (Offset 08h),
>> 7.8.4.6 Correctable Error Mask Register (Offset 14h)
>>
>> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
>
> I'm not sure doing this from a driver other than the one handling the
> errors makes sense. It is doing a couple of RMW without any locking
> or guarantees that the driver bound to the PCI port might care about
> this changing.
>

I think this could fit into the helper function mentioned in our earlier
discussion. When the portdrv's notifier enabler is called it could also
enable the UIE/CIE.

> I'd like more info on why we don't just turn this on in general
> and hence avoid the need to control it from the 'wrong' place.
>
> Jonathan
>

I was trying to enable only where needed given the one case is not a
pattern, yet. At this point it is only for CXL RCH downstream port
and CXL VH ports (portdrv).

Would you like for the UIE/CIE unmask added to the AER driver init ?

>
>
>> ---
>> drivers/cxl/core/pci.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
>> index e630eccb733d..73637d39df0a 100644
>> --- a/drivers/cxl/core/pci.c
>> +++ b/drivers/cxl/core/pci.c
>> @@ -861,6 +861,12 @@ void cxl_setup_parent_uport(struct device *host, struct cxl_port *port)
>> struct device *uport_dev = port->uport_dev;
>>
>> cxl_port_map_regs(uport_dev, map, regs);
>> +
>> + if (dev_is_pci(uport_dev)) {
>> + struct pci_dev *pdev = to_pci_dev(uport_dev);
>> +
>> + pci_aer_unmask_internal_errors(pdev);
>
> I'd skip the local variable for conciseness.
>
>> + }
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_setup_parent_uport, CXL);
>>
>> @@ -878,6 +884,12 @@ void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport)
>>
>> if (dport->rch)
>> cxl_disable_rch_root_ints(dport);
>> +
>> + if (dev_is_pci(dport_dev)) {
>> + struct pci_dev *pdev = to_pci_dev(dport_dev);
>> +
>> + pci_aer_unmask_internal_errors(pdev);
>
> likewise.
>

Got it.

Regards,
Terry

>> + }
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_setup_parent_dport, CXL);
>>
>