Re: [PATCH v3 2/2] ACPI: APEI: Filter the PCI MCFG address with an arch-agnostic method

From: Bjorn Helgaas
Date: Mon Oct 25 2021 - 19:37:42 EST


On Fri, Oct 22, 2021 at 05:52:15PM +0800, Xuesong Chen wrote:
> On 22/10/2021 00:57, Bjorn Helgaas wrote:
> > On Thu, Oct 21, 2021 at 11:46:40PM +0800, Xuesong Chen wrote:
> >> On 21/10/2021 02:50, Bjorn Helgaas wrote:
> >>> On Wed, Oct 20, 2021 at 11:16:38AM +0800, Xuesong Chen wrote:
> >>>> On 20/10/2021 03:23, Bjorn Helgaas wrote:
> >>>>> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
> >
> >>>>>> This patch will try to handle this case in a more common way
> >>>>>> instead of the original 'arch' specific solution, which will be
> >>>>>> beneficial to all the APEI-dependent platforms after that.
> >>>>>
> >>>>> This actually doesn't say anything about what the patch does or
> >>>>> how it works. It says "handles this case in a more common way"
> >>>>> but with no details.
> >>>>
> >>>> Good suggestion, I'll give more details about that...
> >>>>
> >>>>> The EINJ table contains "injection instructions" that can read
> >>>>> or write "register regions" described by generic address
> >>>>> structures (see ACPI v6.3, sec 18.6.2 and 18.6.3), and
> >>>>> __einj_error_trigger() requests those register regions with
> >>>>> request_mem_region() or request_region() before executing the
> >>>>> injections instructions.
> >>>>>
> >>>>> IIUC, this patch basically says "if this region is part of the
> >>>>> MCFG area, we don't need to reserve it." That leads to the
> >>>>> questions of why we need to reserve *any* of the areas
> >>>>
> >>>> AFAIK, the MCFG area is reserved since the ECAM module will
> >>>> provide a generic Kernel Programming Interfaces(KPI), e.g,
> >>>> pci_generic_config_read(...), so all the drivers are allowed to
> >>>> access the pci config space only by those KPIs in a consistent
> >>>> and safe way, direct raw access will break the rule. Correct me
> >>>> if I am missing sth.
> >>>>
> >>>>> and why it's safe to simply skip reserving regions that are part
> >>>>> of the MCFG area.
> >>>>
> >>>> Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error
> >>>> injection tolerance level") before to address this issue, the
> >>>> entire commit log as below:
> >>>>
> >>>> Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
> >>>> specific errors. EINJ will report errors as below when hitting such
> >>>> cases:
> >>>>
> >>>> APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers
> >>>>
> >>>> It is because on x86 platform ACPI based PCI MMCFG logic has
> >>>> reserved all MMCFG spaces so that EINJ can't reserve it again.
> >>>> We already trust the ACPI/APEI code when using the EINJ interface
> >>>> so it is not a big leap to also trust it to access the right
> >>>> MMCFG addresses. Skip address checking to allow the access.
> >>>
> >>> I'm not really convinced by that justification because I don't
> >>> think the issue here is *trust*. If all we care about is trust,
> >>> and we trust the ACPI/APEI code, why do we need to reserve
> >>> anything at all when executing EINJ actions?
> >>>
> >>> I think the resource reservation issue is about coordinating
> >>> multiple users of the address space. A driver reserves the MMIO
> >>> address space of a device it controls so no other driver can
> >>> reserve it at the same time and cause conflicts.
> >>>
> >>> I'm not really convinced by this mutual exclusion argument either,
> >>> because I haven't yet seen a situation where we say "EINJ needs a
> >>> resource that's already in use by somebody else, so we can't use
> >>> EINJ." When conflicts arise, the response is always "we'll just
> >>> stop reserving this conflicting resource but use it anyway."
> >>>
> >>> I think the only real value in apei_resources_request() is a
> >>> little bit of documentation in /proc/iomem. For ERST and EINJ,
> >>> even that only lasts for the tiny period when we're actually
> >>> executing an action.
> >>>
> >>> So convince me there's a reason why we shouldn't just remove
> >>> apei_resources_request() completely :)
> >>
> >> I have to confess that currently I have no strong evidence/reason to
> >> convince you that it's absolute safe to remove
> >> apei_resources_request(), probably in some conditions it *does*
> >> require to follow the mutual exclusion usage model. The ECAM/MCFG
> >> maybe a special case not like other normal device driver, since all
> >> its MCFG space has been reserved during the initialization. Anyway,
> >> it's another topic and good point well worth discussing in the
> >> future.
> >
> > This is missing the point. It's not the MCFG reservation during
> > initialization that would make this safe. What would make it safe is
> > the fact that ECAM does not require mutual exclusion.
> >
> > When the hardware implements ECAM correctly, PCI config accesses do
> > not require locking because a config access requires a single MMIO
> > load or store.
>
> I don't quite understand here, we're talking about
> apei_resources_request() which is a mechanism to void resource
> conflict,"request_mem_region() tells the kernel that your driver is
> going to use this range of I/O addresses, which will prevent other
> drivers to make any overlapping call to the same region through
> request_mem_region()", but according to the context of 'a single
> MMIO load or store', are you talking about something like the mutex
> lock primitive?

My point was that when ECAM is implemented correctly, a CPU does a
single MMIO load to do a PCI config read and a single MMIO store to do
a PCI config write. In that case there no need for any locking, so
there's no need for APEI to reserve those resources.

This is what d91525eb8ee6 ("ACPI, EINJ: Enhance error injection
tolerance level") does. That code change makes sense, but the commit
log does not -- it has nothing to do with trusting the ACPI/APEI code;
it's just that no matter what the EINJ actions do with the MCFG
regions, they cannot interfere with other drivers.

> > Many non-ECAM config accessors *do* require locking because they use
> > several register accesses, e.g., the 0xCF8/0xCFC address/data pairs
> > used by pci_conf1_read(). If EINJ actions used these, we would have
> > to enforce mutual exclusion between EINJ config accesses and those
> > done by other drivers.
>
> I take a look at the pci_conf1_read() function, there's only a pair of
> raw_spin_lock_irqsave() and raw_spin_unlock_irqrestore(), if that's the
> mutual exclusion you mentioned, seems it's not related to the
> apei_resources_request() we're talking about...

This was an example of a case where EINJ mutual exclusion *would* be
required. I do not expect EINJ actions to use the 0xCF8/0xCFC
registers because there is no mechanism to coordinate that with the OS
use of the same registers.

> > Some ARM64 platforms do not implement ECAM correctly, e.g.,
> > tegra194_map_bus() programs an outbound ATU and xgene_pcie_map_bus()
> > sets an RTDID register before the MMIO load/store. Platforms like
> > this *do* require mutual exclusion between an EINJ config access and
> > other config accesses.
>
> What's the mutual exclusion for those quirk functions (tegra194 and
> xgene)? *mutual* is not applied for single side. I can see neither
> locking nor request_mem_region() in those bus map functions.

These currently depend on the pci_lock. See PCI_OP_READ() in
drivers/pci/access.c.

EINJ actions cannot acquire the pci_lock, so EINJ actions cannot
safely use ECAM space on those platforms.

> > These platforms are supported via quirks in pci_mcfg.c, so they will
> > have resources in the pci_mcfg_list, and if we just ignore all the
> > MCFG resources in apei_resources_request(), there will be nothing to
> > prevent ordinary driver config accesses from being corrupted by EINJ
> > accesses.
> >
> > I think in general, is probably *is* safe to remove MCFG resources
> > from the APEI reservations, but it would be better if we had some way
> > to prevent EINJ from using MCFG on platforms like tegra194 and xgene.
>
> Just as I mentioned, since there's no mutual exclusion applied for
> the tegra194 and xgene (correct me if I am wrong), putting their MCFG
> resources into the APEI reservation (so the apei_resources_request()
> applied) does nothing

I think apei_resources_request() should continue to reserve MCFG areas
on tegra194 and xgene, but it does not need to reserve them on other
ARM64 platforms.

Bjorn