Re: smp_call_function_single lockups

From: Ingo Molnar
Date: Thu Apr 02 2015 - 05:55:45 EST

* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Apr 1, 2015 at 7:32 AM, Chris J Arges
> <chris.j.arges@xxxxxxxxxxxxx> wrote:
> >
> > I included the full patch in reply to Ingo's email, and when
> > running with that I no longer get the ack_APIC_irq WARNs.
> Ok. That means that the printk's themselves just change timing
> enough, or change the compiler instruction scheduling so that it
> hides the apic problem.

So another possibility would be that it's the third change causing
this change in behavior:

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6cedd7914581..833a981c5420 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -335,9 +340,11 @@ int apic_retrigger_irq(struct irq_data *data)

void apic_ack_edge(struct irq_data *data)
+ ack_APIC_irq();
+ /* Might generate IPIs, so do this after having ACKed the APIC: */
- ack_APIC_irq();


... since with this we won't send IPIs in a semi-nested fashion with
an unacked APIC, which is a good idea to do in general. It's also a
weird enough hardware pattern that virtualization's APIC emulation
might get it slightly wrong or slightly different.

> Which very much indicates that these things are interconnected.
> For example, Ingo's printk patch does
> cfg->move_in_progress =
> cpumask_intersects(cfg->old_domain, cpu_online_mask);
> + if (cfg->move_in_progress)
> + pr_info("apic: vector %02x,
> same-domain move in progress\n", cfg->vector);
> cpumask_and(cfg->domain, cfg->domain, tmp_mask);
> and that means that now the setting of move_in_progress is
> serialized with the cpumask_and() in a way that it wasn't before.

Yeah, that's a possibility too. It all looks very fragile.


