Re: [PATCH] acpi: Fix hed module initialization order when it is built-in

From: Jonathan Cameron
Date: Mon Dec 23 2024 - 14:33:31 EST


On Mon, 23 Dec 2024 17:31:08 +0800
Xiaofei Tan <tanxiaofei@xxxxxxxxxx> wrote:

> Hi Rafael,
>
> 在 2024/12/11 1:59, Rafael J. Wysocki 写道:
> > On Fri, Nov 15, 2024 at 4:56 AM Xiaofei Tan <tanxiaofei@xxxxxxxxxx> wrote:
> >> When the module hed is built-in, the init order is determined by
> >> Makefile order.
> > Are you sure?
>
> yes

We had a similar fix in CXL recently (which is why I suggested this approach
internally when tanxiaofei mentioned the problem).

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/cxl?id=6575b268157f37929948a8d1f3bafb3d7c055bc1

The related discussion for the CXL patch was the first time I'd come across solution
to load order for built in cases.


>
> >> That order violates expectations. Because the module
> >> hed init is behind evged. RAS records can't be handled in the
> >> special time window that evged has initialized while hed not.
> >> If the number of such RAS records is more than the APEI HEST error
> >> source number, the HEST resources could be occupied all, and then
> >> could affect subsequent RAS error reporting.
> > Well, the problem is real, but does the change really prevent it from
> > happening or does it just increase the likelihood of success?
>
> It can be completely solved if the driver used as built-in way. If build HED as a
> module, it not solved.

Can we enforce that condition not happening with appropriate Kconfig?
It's annoying to restrict build options, but if needed to make it work
then better than not working!

Jonathan


>
> >
> > In the latter case, and generally speaking too, it would be better to
> > add explicit synchronization between evged and hed.
> >
> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> >> Signed-off-by: Xiaofei Tan <tanxiaofei@xxxxxxxxxx>
> >> ---
> >> drivers/acpi/Makefile | 8 +++++++-
> >> 1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> >> index 61ca4afe83dc..54f60b7922ad 100644
> >> --- a/drivers/acpi/Makefile
> >> +++ b/drivers/acpi/Makefile
> >> @@ -15,6 +15,13 @@ endif
> >>
> >> obj-$(CONFIG_ACPI) += tables.o
> >>
> >> +#
> >> +# The hed.o needs to be in front of evged.o to avoid the problem that
> >> +# RAS errors cannot be handled in the special time window of startup
> >> +# phase that evged has initialized while hed not.
> >> +#
> >> +obj-$(CONFIG_ACPI_HED) += hed.o
> >> +
> >> #
> >> # ACPI Core Subsystem (Interpreter)
> >> #
> >> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
> >> obj-$(CONFIG_ACPI_BATTERY) += battery.o
> >> obj-$(CONFIG_ACPI_SBS) += sbshc.o
> >> obj-$(CONFIG_ACPI_SBS) += sbs.o
> >> -obj-$(CONFIG_ACPI_HED) += hed.o
> >> obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o
> >> obj-$(CONFIG_ACPI_BGRT) += bgrt.o
> >> obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o
> >> --
> >> 2.33.0
> >>
> > .