Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking)
From: Ingo Molnar
Date: Mon Apr 20 2015 - 01:40:07 EST
* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Sun, Apr 19, 2015 at 11:01 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> > That's all fine and good, but why is an IPI sent to a non-existent
> > CPU? It's not like we don't know which CPU is up and down.
>
> I agree that the qemu-arm behavior smells like a bug, plain and
> simple. Nobody sane should send random IPI's to CPU's that they
> don't even know are up or not.
Yeah, and a warning would have caught that bug a bit earlier, at the
cost of restricting the API:
> That said, I could imagine that we would have some valid case where
> we just do a cross-cpu call to (for example) do lock wakeup, and the
> code would use some optimistic algorithm that prods the CPU after
> the lock has been released, and there could be some random race
> where the lock data structure has already been released (ie imagine
> the kind of optimistic unlocked racy access to "sem->owner" that we
> discussed as part of the rwsem_spin_on_owner() thread recently).
>
> So I _could_ imagine that somebody would want to do optimistic "prod
> other cpu" calls that in all normal cases are for existing cpus, but
> could be racy in theory.
Yes, and I don't disagree with such optimizations in principle (it
allows less references to be taken in the fast path), but is it really
safe?
If a CPU is going down and we potentially race against that, and send
off an IPI, the IPI might be 'in flight' for an indeterminate amount
of time, especially on wildly non-deterministic hardware like virtual
platforms.
So if the CPU goes down during that time, the typical way a CPU goes
down is that it ignores all IPIs that arrive after that. End result:
the sender of the IPI may hang indefinitely (for the synchronous API),
waiting for the CSD lock to be released...
For the non-wait API we could also hang, but in an even more
interesting way: we won't hang trying to send to the downed CPU, we'd
hang on the _next_ cross-call call, possibly sending to another CPU,
because the CSD_FLAG_LOCK of the sender CPU is never released by the
offlined CPU which was supposed to unlock it.
Also note the existing warning we already have in
flush_smp_call_function_queue():
/* There shouldn't be any pending callbacks on an offline CPU. */
if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
!warned && !llist_empty(head))) {
warned = true;
WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
Couldn't this trigger in the opportunistic, imprecise IPI case, if
IRQs are ever enabled when or after the CPU is marked offline?
My suggested warning would simply catch this kind of unsafe looking
race a bit sooner, instead of silently ignoring the cross-call
attempt.
Now your suggestion could be made to work by:
- polling for CPU status in the CSD-lock code as well. (We don't do
this currently.)
- making sure that if a late IPI arrives to an already-down CPU it
does not attempt to execute CSD functions. (I.e. silence the
above, already existing warning.)
- auditing the CPU-down path to make sure it does not get surprised
by late external IPIs. (I think this should already be so, given
the existing warning.)
- IPIs can be pending indefinitely, so make sure a pending IPI won't
confuse the machinery after a CPU has been onlined again.
- pending IPIs may consume hardware resources when not received
properly. For example I think x86 will keep trying to resend it.
Not sure there's any timeout mechanism on the hardware level - the
sending APIC might even get stuck? Audit all hardware for this
kind of possibility.
So unless I'm missing something, to me it looks like that the current
code is only safe to be used on offline CPUs if the 'offline' CPU is
never ever brought online in the first place.
> It doesn't sound like the qemu-arm case is that kind of situation,
> though. That one just sounds like a stupid "let's send an ipi to a
> cpu whether it exists or not".
>
> But Icommitted it without any new warning, because I could in theory
> see it as being a valid use.
So my point is that we already have a 'late' warning, and I think it
actively hid the qemu-arm bug, because the 'offline' CPU was so
offline (due to being non-existent) that it never had a chance to
print a warning.
With an 'early' warning we could flush out such bugs (or code
uncleanlinesses: clearly the ARM code was at least functional before)
a bit sooner.
But I don't have strong feelings about it, SMP cross-call users are a
lot less frequent than say locking facility users, so the strength of
debugging isn't nearly as important and it's fine to me either way!
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/