[PATCH v2 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU
From: Srivatsa S. Bhat
Date: Tue May 06 2014 - 18:02:42 EST
[...]
>> This will emit the WARN_ON a single time, but will emit the "IPI
>> Payload" list every time the cpu is found to be offline. So on the
>> second and successive occurrences some output will still occur.
>>
>> Unfortunately WARN_ON_ONCE() returns the value of `condition', not
>> `__warned', so we have to hand-code things. Like this?
>>
>
> Yeah, this version looks better. Sorry for missing this earlier.
> I'll incorporate this in my next version of the patchset.
>
Here is the updated patch:
-------------------------------------------------------------------------
From: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
[PATCH v2 1/2] smp: Print more useful debug info upon receiving IPI on an offline CPU
Today the smp-call-function code just prints a warning if we get an IPI on
an offline CPU. This info is sufficient to let us know that something went
wrong, but often it is very hard to debug exactly who sent the IPI and why,
from this info alone.
In most cases, we get the warning about the IPI to an offline CPU, immediately
after the CPU going offline comes out of the stop-machine phase and reenables
interrupts. Since all online CPUs participate in stop-machine, the information
regarding the sender of the IPI is already lost by the time we exit the
stop-machine loop. So even if we dump the stack on each CPU at this point,
we won't find anything useful since all of them will show the stack-trace of
the stopper thread. So we need a better way to figure out who sent the IPI and
why.
To achieve this, when we detect an IPI targeted to an offline CPU, loop through
the call-single-data linked list and print out the payload (i.e., the name
of the function which was supposed to be executed by the target CPU). This
would give us an insight as to who might have sent the IPI and help us debug
this further.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
---
kernel/smp.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e..f864921 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -185,14 +185,24 @@ void generic_smp_call_function_single_interrupt(void)
{
struct llist_node *entry;
struct call_single_data *csd, *csd_next;
+ static bool warned;
+
+ entry = llist_del_all(&__get_cpu_var(call_single_queue));
+ entry = llist_reverse_order(entry);
/*
* Shouldn't receive this interrupt on a cpu that is not yet online.
*/
- WARN_ON_ONCE(!cpu_online(smp_processor_id()));
-
- entry = llist_del_all(&__get_cpu_var(call_single_queue));
- entry = llist_reverse_order(entry);
+ if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
+ warned = true;
+ WARN_ON(1);
+ /*
+ * We don't have to use the _safe() variant here
+ * because we are not invoking the IPI handlers yet.
+ */
+ llist_for_each_entry(csd, entry, llist)
+ pr_warn("SMP IPI Payload: %pS \n", csd->func);
+ }
llist_for_each_entry_safe(csd, csd_next, entry, llist) {
csd->func(csd->info);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/