[PATCH] x86/apic: Add retry mechanism to add_pin_to_irq_node()

From: Breno Leitao
Date: Mon Jul 29 2024 - 10:06:22 EST


I've been running some experiments with failslab fault injector running
to detect a different problem, and the machine always crash with the
following stack:

can not alloc irq_pin_list (-1,0,20)
Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed

Call Trace:
panic
_printk
panic_smp_self_stop
rcu_is_watching
intel_irq_remapping_free

This happens because add_pin_to_irq_node() function would panic if
adding a pin to an IRQ failed due to -ENOMEM (which was injected by
failslab fault injector). I've been running with this patch in my test
cases in order to be able to pick real bugs, and I thought it might be a
good idea to have it upstream also, so, other people trying to find real
bugs don't stumble upon this one. Also, this makes sense in a real
world(?), when retrying a few times might be better than just panicking.

Introduce a retry mechanism that attempts to add the pin up to 3 times
before giving up and panicking. This should improve the robustness of
the IO-APIC code in the face of transient errors.

Since __add_pin_to_irq_node() only returns 0 or -ENOMEM, the retry is only
for -ENOMEM case only.

Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
---
arch/x86/kernel/apic/io_apic.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 477b740b2f26..2846a90366f2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -390,8 +390,14 @@ static void __remove_pin_from_irq(struct mp_chip_data *data, int apic, int pin)
static void add_pin_to_irq_node(struct mp_chip_data *data,
int node, int apic, int pin)
{
- if (__add_pin_to_irq_node(data, node, apic, pin))
- panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
+ int ret, i;
+
+ for (i = 0; i < 3; i++) {
+ ret = __add_pin_to_irq_node(data, node, apic, pin);
+ if (!ret)
+ return;
+ }
+ panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
}

/*
--
2.43.0