Re: smp_call_function_single lockups

From: Linus Torvalds
Date: Thu Apr 02 2015 - 16:57:38 EST

On Thu, Apr 2, 2015 at 12:07 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> So one possibility would be that an 'IPI was sent but lost'.

Yes, the "sent but lost" thing would certainly explain the lockups.

At the same time, that sounds like a huge hardware bug, and that's
somewhat surprising/unlikely.

That said.

> We could try the following trick: poll for completion for a couple of
> seconds (since an IPI is not held up by anything but irqs-off
> sections, it should arrive within microseconds typically - seconds of
> polling should be more than enough), and if the IPI does not arrive,
> print a warning message and re-send the IPI.

Sounds like a reasonable approach. At worst it doesn't fix anything,
and we never see any messages, and that tells us something too.

