Re: [PATCH 01/15] cxl/aer/pci: Add CXL PCIe port error handler callbacks in AER service driver

From: Terry Bowman
Date: Tue Oct 22 2024 - 14:40:40 EST


Hi Dan,

On 10/22/24 12:09, Dan Williams wrote:
> Terry Bowman wrote:
> [..]
>>> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
>>> index 6af5e0425872..42db26195bda 100644
>>> --- a/drivers/pci/pcie/portdrv.c
>>> +++ b/drivers/pci/pcie/portdrv.c
>>> @@ -793,6 +793,7 @@ static struct pci_driver pcie_portdriver = {
>>> .shutdown = pcie_portdrv_shutdown,
>>>
>>> .err_handler = &pcie_portdrv_err_handler,
>>> + .cxl_err_handler = &cxl_portdrv_err_handler,
>>>
>>> .driver_managed_dma = true,
>>
>> Ok. I'm thinking to add a definition for 'pci_dev::cxl_err_handler' of type
>> 'struct pci_error_handler'.
>>
>> 'struct pci_error_handler' contains a slot reset(), resume(), and mmio_enabled() fn
>> pointers that are used in PCIe recovery if available. The plan is for CXL devices to
>> call panic for UCE fatal and non-fatal but it might be good to use the
>> 'struct pci_error_handler' type in case there are needs for the other handlers in
>> the future. It also makes the logic to access and use the error handlers common,
>> requiring less code.
>
> Can you give an example where CXL can reuse 'struct pci_error_handlers`
> infrastructure? The PCI error handlers are built around the idea that
> operations can be paused and recovered, CXL operations assume near
> constant device participation in CPU cache and memory coherency
> protocol.
>
> About the only reuse I can think of is cases where a CXL error could be
> sent down the PCI error handler path, i.e. ones that would send a
> 'pci_channel_io_normal' notice to ->error_detected(). Otherwise,
> pci_channel_state_t and pci_ers_result_t seem to be a poor fit for CXL
> error handling.


I was referring to reusing separate instance of 'struct pci_error_handlers' for CXL
UCE-CE errors.

One example where it can be reused in infrastructure is in err.c's
report_error_detected(). If both PCIe and CXL errors use 'struct pci_error_handlers'
then the updated report_error_detected() becomes a bit simpler with less helper
function logic. But, it's not a reason by itself to choose to reuse 'struct
pci_error_handlers' for CXL errors.

Looking closer at aer,c shows there is no advantage in this file for using 'struct
pci_error_handlers' for CXL errors.

If I understand correctly you want a new type introduced, 'struct cxl_error_handlers'.
And will contain 2 function pointers for CE and UCE handling.

Regards,
Terry