Re: smp_call_function_single lockups

From: Linus Torvalds
Date: Wed Apr 01 2015 - 11:36:43 EST


On Wed, Apr 1, 2015 at 7:32 AM, Chris J Arges
<chris.j.arges@xxxxxxxxxxxxx> wrote:
>
> I included the full patch in reply to Ingo's email, and when running with that
> I no longer get the ack_APIC_irq WARNs.

Ok. That means that the printk's themselves just change timing enough,
or change the compiler instruction scheduling so that it hides the
apic problem.

Which very much indicates that these things are interconnected.

For example, Ingo's printk patch does

cfg->move_in_progress =
cpumask_intersects(cfg->old_domain, cpu_online_mask);
+ if (cfg->move_in_progress)
+ pr_info("apic: vector %02x,
same-domain move in progress\n", cfg->vector);
cpumask_and(cfg->domain, cfg->domain, tmp_mask);

and that means that now the setting of move_in_progress is serialized
with the cpumask_and() in a way that it wasn't before.

And while the code takes the "vector_lock" and disables interrupts,
the interrupts themselves can happily continue on other cpu's, and
they don't take the vector_lock. Neither does send_cleanup_vector(),
which clears that bit, afaik.

I don't know. The locking there is odd.

> My next homework assignments are:
> - Testing with irqbalance disabled

Definitely.

> - Testing w/ the appropriate dump_stack() in Ingo's patch
> - L0 testing

Thanks,

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/