On Wed, Apr 12, 2023 at 05:11:28PM +0800, LeoLiuoc wrote:
在 2023/4/8 7:18, Bjorn Helgaas 写道:
On Tue, Nov 15, 2022 at 11:11:15AM +0800, LeoLiu-oc wrote:
From: leoliu-oc <leoliu-oc@xxxxxxxxxxx>
According to the sec 18.3.2.4, 18.3.2.5 and 18.3.2.6 in ACPI r6.5, the
register values form HEST PCI Express AER Structure should be written to
relevant PCIe Device's AER Capabilities. So the purpose of the patch set
is to extract register values from HEST PCI Express AER structures and
program them into AER Capabilities. Refer to the ACPI Spec r6.5 for a more
detailed description.
I wasn't involved in this part of the ACPI spec, and I don't
understand how this is intended to work.
I see that this series extracts AER mask, severity, and control
information from the ACPI HEST table and uses it to configure PCIe
devices as they are enumerated.
What I don't understand is how this relates to ownership of the AER
capability as negotiated by the _OSC method. Firmware can configure
the AER capability itself, and if it retains control of the AER
capability, the OS can't write to it (with the exception of clearing
EDR error status), so this wouldn't be necessary.
There is no relationship between the ownership of the AER related
register and the ownership of the AER capability in the OS or
Firmware.
I don't understand this; can you say it another way? "Ownership of
the AER related register" and "ownership of the AER capability" sound
exactly the same to me.
The processing here is to initialize the AER related register, not
the AER event. If Firmware is configured with AER register, it will
not be able to handle the runtime hot reset and link retrain cases
in addition to the hotplug case you mentioned below.
If the OS owns the AER capability, I assume it gets to decide for
itself how to configure AER, no matter what the ACPI HEST says.
What information does the OS use to decide how to configure AER? The
ACPI Spec has the following description: PCI Express (PCIe) root
ports may implement PCIe Advanced Error Reporting (AER) support.
This table(HEST) contains information platform firmware supplies to
OSPM for configuring AER support on a given root port. We understand
that HEST stands for user to express expectations.
In the current implementation, the OS already configures a PCIE
device based on _HPP/_HPX method when configuring a PCI device
inserted into a hot-plug slot or initial configuration of a PCI
device at system boot. HEST is just another way to express the
desired configuration of the user.
Why was the HEST mechanism added if the functionality is equivalent
to the existing _HPP/_HPX? There must be something that HEST supplies
that _HPP/_HPX did not.
I think we need some things in the commit log (and short comments in
the code) to help maintain this in the future:
- What problem does this solve, e.g., is there some bug that happens
because we lack this functionality?
- How is this HEST mechanism related to _HPP/_HPX? What are the
differences?
- How is this related to _OSC AER ownership?
I think we ignore _OSC ownership in the existing _HPP/_HPX code, but
that seems like a potential problem. The PCI Firmware spec (r3.3, sec
4.5.1) is pretty clear:
If control of this feature was requested and denied or was not
requested, firmware returns this bit set to 0, and the operating
system must not modify the Advanced Error Reporting Capability or
the other error enable/status bits listed above.
Maybe this is intended for the case where firmware retains AER
ownership but the OS uses native hotplug (pciehp), and this is a way
for the OS to configure new devices as the firmware expects? But in
that case, we still have the problem that the OS can't write to the
AER capability to do this configuration.
Bjorn