Re: [PATCH] ACPI: APEI: GHES: Improve ghes_notify_nmi() status check

From: Shuai Xue

Date: Wed Nov 05 2025 - 20:46:42 EST

在 2025/11/4 07:05, Tony Luck 写道:

ghes_notify_nmi() is called for every NMI and must check whether the NMI was
generated because an error was signalled by platform firmware.

This check is very expensive as for each registered GHES NMI source it reads
from the acpi generic address attached to this error source to get the physical
address of the acpi_hest_generic_status block. It then checks the "block_status"
to see if an error was logged.

The ACPI/APEI code must create virtual mappings for each of those physical
addresses, and tear them down afterwards. On an Icelake system this takes around
15,000 TSC cycles. Enough to disturb efforts to profile system performance.

Hi, Tony

Interesting.

If I understand correctly, you mean ghes_peek_estatus() and
ghes_clear_estatus().

I conducted performance testing on our system (ARM v8) and found the
following average costs:

- ghes_peek_estatus(): 8,138.3 ns (21,160 cycles)
- ghes_clear_estatus(): 2,038.3 ns (5,300 cycles)

If that were not bad enough, there are some atomic accesses in the code path
that will cause cache line bounces between CPUs. A problem that gets worse as
the core count increases.

Could you elaborate on which specific atomic accesses you're referring to?

But BIOS changes neither the acpi generic address nor the physical address of
the acpi_hest_generic_status block. So this walk can be done once when the NMI is
registered to save the virtual address (unmapping if the NMI is ever unregistered).
The "block_status" can be checked directly in the NMI handler. This can be done
without any atomic accesses.

Resulting time to check that there is not an error record is around 900 cycles.

Reported-by: Andi Kleen <andi.kleen@xxxxxxxxx>
Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>

---
N.B. I only talked to an Intel BIOS expert about this. GHES code is shared by
other architectures, so it would be wise to get confirmation on whether this
assumption applies to all, or is Intel (or X86) specific.

The assumption is "BIOS changes neither the acpi generic address nor the
physical address of the acpi_hest_generic_status block."?

I've consulted with our BIOS experts from both ARM and RISC-V platform
teams, and they confirmed that error status blocks are reserved at boot
time and remain unchanged during runtime.

---
include/acpi/ghes.h | 1 +
drivers/acpi/apei/ghes.c | 39 ++++++++++++++++++++++++++++++++++++---
2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index ebd21b05fe6e..58655d313a1f 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -29,6 +29,7 @@ struct ghes {
};
struct device *dev;
struct list_head elist;
+ void __iomem *error_status_vaddr;
};
struct ghes_estatus_node {
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 97ee19f2cae0..62713b612865 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1425,7 +1425,21 @@ static LIST_HEAD(ghes_nmi);
static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
{
static DEFINE_RAW_SPINLOCK(ghes_notify_lock_nmi);
+ bool active_error = false;
int ret = NMI_DONE;
+ struct ghes *ghes;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
+ if (ghes->error_status_vaddr && readl(ghes->error_status_vaddr)) {
+ active_error = true;
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ if (!active_error)
+ return ret;

Shoud we put active_error into struct ghes? If we know it is active, we
do not need to call __ghes_peek_estatus() to estatus->block_status().

Thanks.
Shuai