Re: [RESEND PATCH v4] ACPI: Move sdei_init and ghes_init ahead to handle platform errors earlier

From: Shuai Xue
Date: Thu Jan 13 2022 - 08:38:38 EST


Hi, Bjorn, Rafael,

Happy New Year :)

I am wondering that do you have any comment to this implementation? Any feedback
are welcomed.

Thank you.

Best regrads,
Shuai

在 2021/12/24 PM4:55, Shuai Xue 写道:
> Hi, Bjorn,
>
> Thank you for your comments.
>
> 在 2021/12/24 AM8:17, Bjorn Helgaas 写道:
>> [+to Rafael, question about HEST/GHES/SDEI init]
>>
>> On Thu, Dec 23, 2021 at 04:11:11PM +0800, Shuai Xue wrote:
>>> 在 2021/12/22 AM7:17, Bjorn Helgaas 写道:
>>>> On Thu, Dec 16, 2021 at 09:34:56PM +0800, Shuai Xue wrote:
>>>>> On an ACPI system, ACPI is initialised very early from a subsys_initcall(),
>>>>> while SDEI is not ready until a subsys_initcall_sync().
>>>>>
>>>>> The SDEI driver provides functions (e.g. apei_sdei_register_ghes,
>>>>> apei_sdei_unregister_ghes) to register or unregister event callback for
>>>>> dispatcher in firmware. When the GHES driver probing, it registers the
>>>>> corresponding callback according to the notification type specified by
>>>>> GHES. If the GHES notification type is SDEI, the GHES driver will call
>>>>> apei_sdei_register_ghes to register event call.
>>>>>
>>>>> When the firmware emits an event, it migrates the handling of the event
>>>>> into the kernel at the registered entry-point __sdei_asm_handler. And
>>>>> finally, the kernel will call the registered event callback and return
>>>>> status_code to indicate the status of event handling. SDEI_EV_FAILED
>>>>> indicates that the kernel failed to handle the event.
>>>>>
>>>>> Consequently, when an error occurs during kernel booting, the kernel is
>>>>> unable to handle and report errors until the GHES driver is initialized by
>>>>> device_initcall(), in which the event callback is registered. All errors
>>>>> that occurred before GHES initialization are missed and there is no chance
>>>>> to report and find them again.
>>>>>
>>>>> From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus
>>>>> memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
>>>>> the estatus memory pool. On the other hand, ghes_init() relies on
>>>>> sdei_init() to detect the SDEI version and the framework for registering
>>>>> and unregistering events.
>>>>
>>>>> By the way, I don't figure out why acpi_hest_init is called in
>>>>> acpi_pci_root_init, it don't rely on any other thing. May it could
>>>>> be moved further, following acpi_iort_init in acpi_init.
>>
>>>>> sdei_init() relies on ACPI table which is initialized
>>>>> subsys_initcall(): acpi_init(), acpi_bus_init(), acpi_load_tables(),
>>>>> acpi_tb_laod_namespace(). May it should be also moved further,
>>>>> after acpi_load_tables.
>>
>>>>> In this patch, move sdei_init and ghes_init as far ahead as
>>>>> possible, right after acpi_hest_init().
>>>>
>>>> I'm having a hard time figuring out the reason for this patch.
>>>>
>>>> Apparently the relevant parts are sdei_init() and ghes_init().
>>>> Today they are executed in that order:
>>>>
>>>> subsys_initcall_sync(sdei_init);
>>>> device_initcall(ghes_init);
>>>>
>>>> After this patch, they would be executed in the same order, but called
>>>> explicitly instead of as initcalls:
>>>>
>>>> acpi_pci_root_init()
>>>> {
>>>> acpi_hest_init();
>>>> sdei_init();
>>>> ghes_init();
>>>> ...
>>>>
>>>> Explicit calls are certainly better than initcalls, but that doesn't
>>>> seem to be the reason for this patch.
>>>>
>>>> Does this patch fix a bug? If so, what is the bug?
>>>
>>> Yes. When the kernel booting, the console logs many times from firmware
>>> before GHES drivers init:
>>>
>>> Trip in MM PCIe RAS handle(Intr:910)
>>> Clean PE[1.1.1] ERR_STS:0x4000100 -> 0 INT_STS:F0000000
>>> Find RP(98:1.0)
>>> --Walk dev(98:1.0) CE:0 UCE:4000
>>> ...
>>> ERROR: sdei_dispatch_event(32a) ret:-1
>>> --handler(910) end
>>>
>>> This is because the callback function has not been registered yet.
>>> Previously reported errors will be overwritten by new ones. Therefore,
>>> all errors that occurred before GHES initialization are missed
>>> and there is no chance to report and find them again.
>>>
>>>> You say that currently "errors that occur before GHES initialization
>>>> are missed". Isn't that still true after this patch? Does this patch
>>>> merely reduce the time before GHES initialization? If so, I'm
>>>> dubious, because we have to tolerate an arbitrary amount of time
>>>> there.
>>>
>>> After this patch, there are still errors missing. As you mentioned,
>>> we have to tolerate it until the software reporting capability is built.
>>>
>>> Yes, this patch merely reduce the time before GHES initialization.
>>
>> It's not a bug that errors that happen before the callback are lost.
>> At least, it's not a *Linux* bug. It might be a poor design of the
>> firmware error reporting interface.
>>
>> If the only point of this patch is to reduce the time before GHES
>> initialization, the commit log should clearly say that.
>
> Yep, it is a design defect of firmware. I will explicitly document
> the purpose of this patch in next version.
>
>>> The boot dmesg before this patch:
>>>
>>> [ 3.688586] HEST: Table parsing has been initialized.
>>> ...
>>> [ 33.204340] calling sdei_init+0x0/0x120 @ 1
>>> [ 33.208645] sdei: SDEIv1.0 (0x0) detected in firmware.
>>> ...
>>> [ 36.005390] calling ghes_init+0x0/0x11c @ 1
>>> [ 36.190021] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
>>>
>>>
>>> After this patch, the boot dmesg like bellow:
>>>
>>> [ 3.688664] HEST: Table parsing has been initialized.
>>> [ 3.688691] sdei: SDEIv1.0 (0x0) detected in firmware.
>>> [ 3.694557] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
>
> After this patch, we use explicit call to init GHES rather than initcall.
> The GHES message here is just to prove that GHES initialization is
> advanced to 3.694557 seconds.
>
>
>> [Tangent: I think this GHES message is confusing. What "APEI bit"
>> does this refer to? The only bits I remember are the Flags bits in
>> HEST Error Source Descriptor Entries, e.g., ACPI v6.3, sec 18.3.2.
>
> From the log code behavior, the APEI bit refers to OSC_SB_APEI_SUPPORT.
> (BIT 4, ACPI v6.4, sec 6.2.11.2, table Table 6.13 Platform-Wide _OSC
> Capabilities DWORD 2. [1])
>
> /* code in drivers/acpi/apei/ghes.c */
> if (rc == 0 && osc_sb_apei_support_acked)
> pr_info(GHES_PFX "APEI firmware first mode is enabled by APEI bit and WHEA _OSC.\n");
>
> /* code in drivers/acpi/bus.c */
> osc_sb_apei_support_acked =
> capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
>
>
>> "WHEA _OSC" means nothing to me, and I didn't find anything useful
>> with grep, other than that "WHEA" might be an obsolete name for what
>> we now call "APEI".
>
> Yep, WHEA is an obsolete name. I think apei_osc_setup() is added for compatibility
> in commit eccddd32ced0 ("ACPI, APEI, Add APEI bit support in generic _OSC call")'
>
> The ACPI spec Platform-Wide OSPM Capabilities defines
> UUID "0811B06E-4A27-44F9-8D60-3CBBC22E7B48" is used in
> acpi_bus_osc_support(). The Microsoft WHEA defines
> UUID "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c" that is used
> in apei_osc_setup(). [2][3]
>
> From the discussion[4], the GHES message is to distinguish between
> the two specs.
>
> To avoid confusion, can we change the message as follow.
>
> - generic _OSC succeeded, APEI _OSC failed: APEI firmware first mode is
> enabled by APEI bit.
> - generic _OSC failed, APEI _OSC succeeded: APEI firmware first mode is
> enabled by WHEA _OSC.
> - both succeeded: APEI firmware first mode is enabled by APEI bit and
> WHEA _OSC.
> - both failed: Failed to enable APEI firmware first mode!
>
>
>> I don't think there's anything in _OSC that mentions "firmware first."
>>
>> I don't remember anything in the spec about a way to *enable* Firmware
>> First Error Handling needing (I'm looking at ACPI v6.3, sec 18.4).
>>
>> I think the "firmware first" information is useless to the OS -- as
>> far as I can tell, the spec says nothing about anything the OS should
>> do based on the FIRMWARE_FIRST bits.]
>
> GHES works in "Firmware First" mode to report platform hardware error
> and it depends on APEI interface. (drivers/acpi/apei/ghes.c)
>
> * Generic Hardware Error Source provides a way to report platform
> * hardware errors (such as that from chipset). It works in so called
> * "Firmware First" mode, that is, hardware errors are reported to
> * firmware firstly, then reported to Linux by firmware.
>
> Based on above, I think the GHES message just means that this booting
> kernel supports APEI or WHEA, and GHES report Firmware first errors on
> this machine. Is it fine to keep the message?
>
>
>>> As we can see, the initialization of GHES is advanced by 33 seconds.
>>> So, in my opinion, this patch is necessary, right?
>>> (It should be noted that the effect of optimization varies with the platform.)
>>
>>>>> -device_initcall(ghes_init);
>>>>
>>>>> void __init acpi_pci_root_init(void)
>>>>> {
>>>>> acpi_hest_init();
>>>>> + sdei_init();
>>>>> + ghes_init();
>>>>
>>>> What's the connection between PCI, SDEI, and GHES? As far as I can
>>>> tell, SDEI and GHES are not PCI-specific, so it doesn't seem like they
>>>> should be initialized here in acpi_pci_root_init().
>>>
>>> The only reason is that acpi_hest_init() is initialized here.
>>>
>>> From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus
>>> memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
>>> the estatus memory pool. On the other hand, ghes_init() relies on
>>> sdei_init() to detect the SDEI version and the framework for registering
>>> and unregistering events. The dependencies are as follows
>>>
>>> ghes_init() => acpi_hest_init()
>>> ghes_init() => sdei_init()
>>>
>>> I don't figure out why acpi_hest_init() is called in
>>> acpi_pci_root_init(), it don't rely on any other thing.
>>> I am wondering that should we moved all of them further? e.g.
>>> following acpi_iort_init() in acpi_init().
>>
>> I don't know why acpi_hest_init() is called from acpi_pci_root_init().
>> It looks like HEST can support error sources other than PCI (IA-32
>> Machine Checks, NMIs, GHES, etc.) It was added by 415e12b23792
>> ("PCI/ACPI: Request _OSC control once for each root bridge (v3)");
>> maybe Rafael remembers why.
>>
>> Seem like acpi_hest_init(), sdei_init(), and ghes_init() should all go
>> somewhere else, but I don't know where. Maybe somewhere in
>> acpi_init()?
>
> I tried to move them into acpi_init(), and it works well.
>
> @@ -1251,6 +1251,9 @@ static int __init acpi_init(void)
>
> pci_mmcfg_late_init();
> acpi_iort_init();
> + acpi_hest_init();
> + sdei_init();
> + ghes_init();
> acpi_scan_init();
> acpi_ec_init();
>
> What's your opinion?
>
> Best Regards,
>
> Shuai
>
>
> [1] https://uefi.org/specs/ACPI/6.4/06_Device_Configuration/Device_Configuration.html#platform-wide-osc-capabilities-dword-2
> [2] https://www.mail-archive.com/ubuntu-bugs@xxxxxxxxxxxxxxxx/msg4718503.html
> [3] http://www.guyreams.com/PDF/whea.pdf
> [4] https://patchwork.kernel.org/project/linux-acpi/patch/1308640587-24502-5-git-send-email-ying.huang@xxxxxxxxx/#1978922