Re: [PATCH v3 2/2] ACPI: APEI: Filter the PCI MCFG address with an arch-agnostic method
From: Bjorn Helgaas
Date: Tue Oct 26 2021 - 16:47:27 EST
On Tue, Oct 26, 2021 at 05:16:47PM +0800, Xuesong Chen wrote:
> On 26/10/2021 07:37, Bjorn Helgaas wrote:
> > My point was that when ECAM is implemented correctly, a CPU does a
> > single MMIO load to do a PCI config read and a single MMIO store to do
> > a PCI config write. In that case there no need for any locking, so
> > there's no need for APEI to reserve those resources.
>
> Ah, got it. That means the PCI ECAM has a implicit mutual exclusion with EINJ
> if the hardware implemention is correct, so we can remove the MCFG from
> the APEI's safely.
Well, not quite. ECAM doesn't *need* mutual exclusion. Single loads
and stores are atomic by definition.
> > I think apei_resources_request() should continue to reserve MCFG areas
> > on tegra194 and xgene, but it does not need to reserve them on other
> > ARM64 platforms.
>
> As a summary: we need to reserve the MCFG areas on those platforms with a
> quirk ECAM implementation since there's no lockless method to access the
> configuration space, on other platforms we don't need to reserve the MCFG
> resources (so can remove it safely).
>
> So we need to add another patch to handle the case of tegra194 and xgene...
> I will try to figure it out.
I looked through these again and found another problem case (thunder).
Here are my notes from my research.
Normal ECAM users require no device-specific support. The platform
supplies an MCFG table, the generic code works, no mutual exclusion is
required, and APEI doesn't need to reserve the MCFG areas.
The problem cases are platforms that supply an MCFG table but require
some device-specific workarounds. We can identify these because they
have quirks in pci-mcfg.c. Here are the existing quirks and the
pci_ecam_ops structs they supply:
AL_ECAM al_pcie_ops # OK
QCOM_ECAM32 pci_32b_ops # OK
HISI_QUAD_DOM hisi_pcie_ops # OK
THUNDER_PEM_QUIRK thunder_pem_ecam_ops # problem
THUNDER_PEM_QUIRK thunder_pem_ecam_ops # problem
THUNDER_ECAM_QUIRK pci_thunder_ecam_ops # OK
tegra tegra194_pcie_ops # problem
XGENE_V1_ECAM_MCFG xgene_v1_pcie_ecam_ops # problem
XGENE_V2_ECAM_MCFG xgene_v2_pcie_ecam_ops # problem
ALTRA_ECAM_QUIRK pci_32b_read_ops # OK
The ones marked "OK" have .map_bus(), .read(), and .write() methods
that need no mutual exclusion because they boil down to just a single
MMIO load or store. These are fine and there shouldn't be a problem
if an EINJ action accesses the ECAM space.
The others do require mutual exclusion:
- thunder_pem_ecam_ops: thunder_pem_config_read() calls
thunder_pem_bridge_read(), which does a writeq() to PEM_CFG_RD
followed by a readq(). The writeq() and readq() must be atomic to
avoid corruption.
- tegra194_pcie_ops: tegra194_map_bus() programs the ATU. This and
the subsequent ECAM read/write must be atomic.
- xgene_v1_pcie_ecam_ops and xgene_v2_pcie_ecam_ops:
xgene_pcie_map_bus() sets the RTID. This and the subsequent ECAM
read/write must be atomic.
I had to look at all these ops individually to find them, so I don't
see an easy way to identify these problem cases at run-time.
I personally would not have an issue with having APEI try to reserve
the MCFG regions for any platform that has an MCFG quirk. That would
prevent the al, qcom, hisi, thunder-ecam, and altra drivers from using
EINJ even though it would probably be safe for them. But we already
know those platforms are not really ACPI-compliant, so ...
Bjorn