Re: [PATCH 04/15] cxl/aer/pci: Add CXL PCIe port correctable error support in AER service driver

From: Jonathan Cameron
Date: Wed Oct 16 2024 - 13:29:49 EST


On Wed, 16 Oct 2024 12:18:06 -0500
Terry Bowman <Terry.Bowman@xxxxxxx> wrote:

> Hi Jonathan,
>
> On 10/16/24 11:22, Jonathan Cameron wrote:
> > On Tue, 8 Oct 2024 17:16:46 -0500
> > Terry Bowman <terry.bowman@xxxxxxx> wrote:
> >
> >> The AER service driver currently does not manage CXL PCIe port
> >> protocol errors reported by CXL root ports, CXL upstream switch ports,
> >> and CXL downstream switch ports. Consequently, RAS protocol errors
> >> from CXL PCIe port devices are not properly logged or handled.
> >>
> >> These errors are reported to the OS via the root port's AER correctable
> >> and uncorrectable internal error fields. While the AER driver supports
> >> handling downstream port protocol errors in restricted CXL host (RCH)
> >> mode also known as CXL1.1, it lacks the same functionality for CXL
> >> PCIe ports operating in virtual hierarchy (VH) mode, introduced in
> >> CXL2.0.
> >>
> >> To address this gap, update the AER driver to handle CXL PCIe port
> >> device protocol correctable errors (CE).
> >>
> >> The uncorrectable error handling (UCE) will be added in a future
> >> patch.
> >>
> >> Make this update alongside the existing downstream port RCH error
> >> handling logic, extending support to CXL PCIe ports in VH.
> >>
> >> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
> > Minor comments inline.
> >
> > J
> >> ---
> >> drivers/pci/pcie/aer.c | 54 +++++++++++++++++++++++++++++++++---------
> >> 1 file changed, 43 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> >> index dc8b17999001..1c996287d4ce 100644
> >> --- a/drivers/pci/pcie/aer.c
> >> +++ b/drivers/pci/pcie/aer.c
> >> @@ -40,6 +40,8 @@
> >> #define AER_MAX_TYPEOF_COR_ERRS 16 /* as per PCI_ERR_COR_STATUS */
> >> #define AER_MAX_TYPEOF_UNCOR_ERRS 27 /* as per PCI_ERR_UNCOR_STATUS*/
> >>
> >> +#define CXL_DVSEC_PORT_EXTENSIONS 3
> >
> > Duplicate of definition in drivers/cxl/cxlpci.h
> >
> > Maybe wrap it up in an is_cxl_port() or similar? Or just
> > move that to a header both places can exercise.
> >
> >
>
> Ok. I'll move the value '3' into the function call rather than use a #define.
Not that's worse!

Find a way to have just one definition.

>
> >> +
> >> struct aer_err_source {
> >> u32 status; /* PCI_ERR_ROOT_STATUS */
> >> u32 id; /* PCI_ERR_ROOT_ERR_SRC */
> >> @@ -941,6 +943,17 @@ static bool find_source_device(struct pci_dev *parent,
> >> return true;
> >> }
> >>
> >> +static bool is_pcie_cxl_port(struct pci_dev *dev)
> >> +{
> >> + if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
> >> + (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
> >> + (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM))
> >> + return false;
> >> +
> >> + return (!!pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
> >> + CXL_DVSEC_PORT_EXTENSIONS));
> >
> > No need for the !! it will return the same without that clamping to 1/0
> > because any non 0 value is true.
> >
>
> Ok
>
> Regards,
> Terry
> >> +}
> >> +