Re: [PATCH v1 1/1] PCI/AER: Use _OSC negotiation to determine AER ownership
From: Austin.Bolen
Date: Fri May 01 2020 - 10:40:29 EST
On 4/30/2020 6:02 PM, Bjorn Helgaas wrote:
> [EXTERNAL EMAIL]
>
> [Austin, help us understand the FIRMWARE_FIRST bit! :)]
>
> On Thu, Apr 30, 2020 at 05:40:22PM -0500, Bjorn Helgaas wrote:
>> On Sun, Apr 26, 2020 at 11:30:06AM -0700, sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx wrote:
>>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
>>>
>>> Currently PCIe AER driver uses HEST FIRMWARE_FIRST bit to
>>> determine the PCIe AER Capability ownership between OS and
>>> firmware. This support is added based on following spec
>>> reference.
>>>
>>> Per ACPI spec r6.3, table 18-387, 18-388, 18-389, HEST table
>>> flags field BIT-0 and BIT-1 can be used to expose the
>>> ownership of error source between firmware and OSPM.
>>>
>>> Bit [0] - FIRMWARE_FIRST: If set, indicates that system
>>> firmware will handle errors from this source
>>> first.
>>> Bit [1] – GLOBAL: If set, indicates that the settings
>>> contained in this structure apply globally to all
>>> PCI Express Bridges.
>>>
>>> Although above spec reference states that setting
>>> FIRMWARE_FIRST bit means firmware will handle the error source
>>> first, it does not explicitly state anything about AER
>>> ownership. So using HEST to determine AER ownership is
>>> incorrect.
>>>
>>> Also, as per following specification references, _OSC can be
>>> used to negotiate the AER ownership between firmware and OS.
>>> Details are,
>>>
>>> Per ACPI spec r6.3, sec 6.2.11.3, table titled “Interpretation
>>> of _OSC Control Field” and as per PCI firmware specification r3.2,
>>> sec 4.5.1, table 4-5, OS can set bit 3 of _OSC control field
>>> to request control over PCI Express AER. If the OS successfully
>>> receives control of this feature, it must handle error reporting
>>> through the AER Capability as described in the PCI Express Base
>>> Specification.
>>>
>>> Since above spec references clearly states _OSC can be used to
>>> determine AER ownership, don't depend on HEST FIRMWARE_FIRST bit.
>> I pulled out the _OSC part of this to a separate patch. What's left
>> is below, and is essentially equivalent to Alex's patch:
>>
>> https://lore.kernel.org/r/20190326172343.28946-3-mr.nuke.me@xxxxxxxxx/
>>
>> I like what this does, but what I don't like is the fact that we now
>> have this thing called pcie_aer_get_firmware_first() that is not
>> connected with the ACPI FIRMWARE_FIRST bit at all.
> Austin, if we remove this, we'll have no PCIe-related code that looks
> at the HEST FIRMWARE_FIRST bit at all. Presumably it's there for some
> reason, but I'm not sure what the reason is.
>
> Alex's mail [1] has a nice table of _OSC AER/HEST FFS bits that looks
> useful, but the only actionable thing I can see is that in the (1,1)
> case, OSPM should do some initialization with masks/enables.
>
> But I have no clue what that means or how to connect that with the
> spec. What are the masks/enables? Is that something connected with
> ERST?
>
> [1] https://lore.kernel.org/r/20190326172343.28946-1-mr.nuke.me@xxxxxxxxx/
The only values that make sense to me are (1, 0) for full native OS
init/handling of AER and (0, 1) for the firmware first model where
firmware does the init and handles errors first then passes control to
the OS. For these cases the FIRMWARE_FIRST flag in HEST is redundant and
not needed.
We did discuss the (1, 1) case in the ACPI working group and there were
a potential use case (which Alex documented in the link you provided)
but there is nothing specified in the standard about how that model
would actually work AFAICT. And no x86 system has the hardware support
needed for what was proposed that I'm aware of (not sure about other
architectures).
So unless and until someone documents how the firmware and OS are
supposed to behave in the (1, 1) or (0, 0) scenario and expresses a need
for those models, I would not bother adding support for them. Just my 2
cents.
>
>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>> index f4274d301235..85d73d28ec26 100644
>>> --- a/drivers/pci/pcie/aer.c
>>> +++ b/drivers/pci/pcie/aer.c
>>> @@ -217,133 +217,19 @@ void pcie_ecrc_get_policy(char *str)
>>> }
>>> #endif /* CONFIG_PCIE_ECRC */
>>>
>>> -#ifdef CONFIG_ACPI_APEI
>>> -static inline int hest_match_pci(struct acpi_hest_aer_common *p,
>>> - struct pci_dev *pci)
>>> -{
>>> - return ACPI_HEST_SEGMENT(p->bus) == pci_domain_nr(pci->bus) &&
>>> - ACPI_HEST_BUS(p->bus) == pci->bus->number &&
>>> - p->device == PCI_SLOT(pci->devfn) &&
>>> - p->function == PCI_FUNC(pci->devfn);
>>> -}
>>> -
>>> -static inline bool hest_match_type(struct acpi_hest_header *hest_hdr,
>>> - struct pci_dev *dev)
>>> -{
>>> - u16 hest_type = hest_hdr->type;
>>> - u8 pcie_type = pci_pcie_type(dev);
>>> -
>>> - if ((hest_type == ACPI_HEST_TYPE_AER_ROOT_PORT &&
>>> - pcie_type == PCI_EXP_TYPE_ROOT_PORT) ||
>>> - (hest_type == ACPI_HEST_TYPE_AER_ENDPOINT &&
>>> - pcie_type == PCI_EXP_TYPE_ENDPOINT) ||
>>> - (hest_type == ACPI_HEST_TYPE_AER_BRIDGE &&
>>> - (dev->class >> 16) == PCI_BASE_CLASS_BRIDGE))
>>> - return true;
>>> - return false;
>>> -}
>>> -
>>> -struct aer_hest_parse_info {
>>> - struct pci_dev *pci_dev;
>>> - int firmware_first;
>>> -};
>>> -
>>> -static int hest_source_is_pcie_aer(struct acpi_hest_header *hest_hdr)
>>> -{
>>> - if (hest_hdr->type == ACPI_HEST_TYPE_AER_ROOT_PORT ||
>>> - hest_hdr->type == ACPI_HEST_TYPE_AER_ENDPOINT ||
>>> - hest_hdr->type == ACPI_HEST_TYPE_AER_BRIDGE)
>>> - return 1;
>>> - return 0;
>>> -}
>>> -
>>> -static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
>>> -{
>>> - struct aer_hest_parse_info *info = data;
>>> - struct acpi_hest_aer_common *p;
>>> - int ff;
>>> -
>>> - if (!hest_source_is_pcie_aer(hest_hdr))
>>> - return 0;
>>> -
>>> - p = (struct acpi_hest_aer_common *)(hest_hdr + 1);
>>> - ff = !!(p->flags & ACPI_HEST_FIRMWARE_FIRST);
>>> -
>>> - /*
>>> - * If no specific device is supplied, determine whether
>>> - * FIRMWARE_FIRST is set for *any* PCIe device.
>>> - */
>>> - if (!info->pci_dev) {
>>> - info->firmware_first |= ff;
>>> - return 0;
>>> - }
>>> -
>>> - /* Otherwise, check the specific device */
>>> - if (p->flags & ACPI_HEST_GLOBAL) {
>>> - if (hest_match_type(hest_hdr, info->pci_dev))
>>> - info->firmware_first = ff;
>>> - } else
>>> - if (hest_match_pci(p, info->pci_dev))
>>> - info->firmware_first = ff;
>>> -
>>> - return 0;
>>> -}
>>> -
>>> -static void aer_set_firmware_first(struct pci_dev *pci_dev)
>>> -{
>>> - int rc;
>>> - struct aer_hest_parse_info info = {
>>> - .pci_dev = pci_dev,
>>> - .firmware_first = 0,
>>> - };
>>> -
>>> - rc = apei_hest_parse(aer_hest_parse, &info);
>>> -
>>> - if (rc)
>>> - pci_dev->__aer_firmware_first = 0;
>>> - else
>>> - pci_dev->__aer_firmware_first = info.firmware_first;
>>> - pci_dev->__aer_firmware_first_valid = 1;
>>> -}
>>> -
>>> int pcie_aer_get_firmware_first(struct pci_dev *dev)
>>> {
>>> + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
>>> +
>>> if (!pci_is_pcie(dev))
>>> return 0;
>>>
>>> if (pcie_ports_native)
>>> return 0;
>>>
>>> - if (!dev->__aer_firmware_first_valid)
>>> - aer_set_firmware_first(dev);
>>> - return dev->__aer_firmware_first;
>>> + return !host->native_aer;
>>> }
>>> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
>>> index 64b5e081cdb2..c05b49d740f4 100644
>>> --- a/drivers/pci/pcie/portdrv.h
>>> +++ b/drivers/pci/pcie/portdrv.h
>>> @@ -147,16 +147,7 @@ static inline bool pcie_pme_no_msi(void) { return false; }
>>> static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {}
>>> #endif /* !CONFIG_PCIE_PME */
>>>
>>> -#ifdef CONFIG_ACPI_APEI
>>> int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
>>> -#else
>>> -static inline int pcie_aer_get_firmware_first(struct pci_dev *pci_dev)
>>> -{
>>> - if (pci_dev->__aer_firmware_first_valid)
>>> - return pci_dev->__aer_firmware_first;
>>> - return 0;
>>> -}
>>> -#endif
>>>
>>> struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
>>> #endif /* _PORTDRV_H_ */
>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>> index 83ce1cdf5676..43f265830eca 100644
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -420,8 +420,6 @@ struct pci_dev {
>>> * mappings to make sure they cannot access arbitrary memory.
>>> */
>>> unsigned int untrusted:1;
>>> - unsigned int __aer_firmware_first_valid:1;
>>> - unsigned int __aer_firmware_first:1;
>>> unsigned int broken_intx_masking:1; /* INTx masking can't be used */
>>> unsigned int io_window_1k:1; /* Intel bridge 1K I/O windows */
>>> unsigned int irq_managed:1;
>>> --
>>> 2.17.1
>>>