Re: [PATCH RFC] PCI/AER: Enable internal AER errors by default
From: Ira Weiny
Date: Tue Feb 14 2023 - 19:09:11 EST
Bjorn Helgaas wrote:
> On Fri, Feb 10, 2023 at 02:33:23PM -0800, Ira Weiny wrote:
> > The CXL driver expects internal error reporting to be enabled via
> > pci_enable_pcie_error_reporting(). It is likely other drivers expect the same.
> > Dave submitted a patch to enable the CXL side[1] but the PCI AER registers
> > still mask errors.
> >
> > PCIe v6.0 Uncorrectable Mask Register (7.8.4.3) and Correctable Mask
> > Register (7.8.4.6) default to masking internal errors. The
> > Uncorrectable Error Severity Register (7.8.4.4) defaults internal errors
> > as fatal.
> >
> > Enable internal errors to be reported via the standard
> > pci_enable_pcie_error_reporting() call. Ensure uncorrectable errors are set
> > non-fatal to limit any impact to other drivers.
>
> Do you have any background on why the spec makes these errors masked
> by default? I'm sympathetic to wanting to learn about all the errors
> we can, but I'm a little wary if the spec authors thought it was
> important to mask these by default.
>
I don't have any idea of the history.
To me 'internal errors' is a pretty wide net and was likely a catch all
that the authors felt was mostly unneeded.
CXL is different because it further divides the errors.
I've enlisted some help internal to Intel to hopefully find some answers.
But in the event no one knows it would be safe to to with my alternate
suggestion and add a new PCIe call to enable this specifically for the
drivers who need it.
Ira