Re: [PATCH v3 2/2] ACPI: APEI: Filter the PCI MCFG address with an arch-agnostic method
From: Xuesong Chen
Date: Wed Oct 27 2021 - 01:29:56 EST
On 27/10/2021 04:47, Bjorn Helgaas wrote:
> On Tue, Oct 26, 2021 at 05:16:47PM +0800, Xuesong Chen wrote:
>> On 26/10/2021 07:37, Bjorn Helgaas wrote:
>
>>> My point was that when ECAM is implemented correctly, a CPU does a
>>> single MMIO load to do a PCI config read and a single MMIO store to do
>>> a PCI config write. In that case there no need for any locking, so
>>> there's no need for APEI to reserve those resources.
>>
>> Ah, got it. That means the PCI ECAM has a implicit mutual exclusion with EINJ
>> if the hardware implemention is correct, so we can remove the MCFG from
>> the APEI's safely.
>
> Well, not quite. ECAM doesn't *need* mutual exclusion. Single loads
> and stores are atomic by definition.
>
OK, because ECAM config access has intrinsic atomic primitive, so no need to
reserve those ECAM config space resources for APEI, that's why commit d91525eb8ee6
("ACPI, EINJ: Enhance error injection tolerance level") fix make sense...
>>> I think apei_resources_request() should continue to reserve MCFG areas
>>> on tegra194 and xgene, but it does not need to reserve them on other
>>> ARM64 platforms.
>>
>> As a summary: we need to reserve the MCFG areas on those platforms with a
>> quirk ECAM implementation since there's no lockless method to access the
>> configuration space, on other platforms we don't need to reserve the MCFG
>> resources (so can remove it safely).
>>
>> So we need to add another patch to handle the case of tegra194 and xgene...
>> I will try to figure it out.
>
> I looked through these again and found another problem case (thunder).
> Here are my notes from my research.
>
> Normal ECAM users require no device-specific support. The platform
> supplies an MCFG table, the generic code works, no mutual exclusion is
> required, and APEI doesn't need to reserve the MCFG areas.
>
> The problem cases are platforms that supply an MCFG table but require
> some device-specific workarounds. We can identify these because they
> have quirks in pci-mcfg.c. Here are the existing quirks and the
> pci_ecam_ops structs they supply:
>
> AL_ECAM al_pcie_ops # OK
> QCOM_ECAM32 pci_32b_ops # OK
> HISI_QUAD_DOM hisi_pcie_ops # OK
> THUNDER_PEM_QUIRK thunder_pem_ecam_ops # problem
> THUNDER_PEM_QUIRK thunder_pem_ecam_ops # problem
> THUNDER_ECAM_QUIRK pci_thunder_ecam_ops # OK
> tegra tegra194_pcie_ops # problem
> XGENE_V1_ECAM_MCFG xgene_v1_pcie_ecam_ops # problem
> XGENE_V2_ECAM_MCFG xgene_v2_pcie_ecam_ops # problem
> ALTRA_ECAM_QUIRK pci_32b_read_ops # OK
>
> The ones marked "OK" have .map_bus(), .read(), and .write() methods
> that need no mutual exclusion because they boil down to just a single
> MMIO load or store. These are fine and there shouldn't be a problem
> if an EINJ action accesses the ECAM space.
>
> The others do require mutual exclusion:
>
> - thunder_pem_ecam_ops: thunder_pem_config_read() calls
> thunder_pem_bridge_read(), which does a writeq() to PEM_CFG_RD
> followed by a readq(). The writeq() and readq() must be atomic to
> avoid corruption.
>
> - tegra194_pcie_ops: tegra194_map_bus() programs the ATU. This and
> the subsequent ECAM read/write must be atomic.
>
> - xgene_v1_pcie_ecam_ops and xgene_v2_pcie_ecam_ops:
> xgene_pcie_map_bus() sets the RTID. This and the subsequent ECAM
> read/write must be atomic.
>
> I had to look at all these ops individually to find them, so I don't
> see an easy way to identify these problem cases at run-time.
>
> I personally would not have an issue with having APEI try to reserve
> the MCFG regions for any platform that has an MCFG quirk. That would
> prevent the al, qcom, hisi, thunder-ecam, and altra drivers from using
> EINJ even though it would probably be safe for them. But we already
> know those platforms are not really ACPI-compliant, so ...
OK, understood. Since those platforms are not really ACPI-compliant, so
we can unify all the quirks together. Let me send an inital solution about
this for your review and see if there's room for further improvement...
Thanks,
Xuesong
>
> Bjorn
>