Re: WARNING: kernel/smp.c:292 smp_call_function_single [Was: mmotm2009-11-24-16-47 uploaded]

From: Avi Kivity
Date: Mon Nov 30 2009 - 04:43:08 EST


On 11/30/2009 10:58 AM, Tejun Heo wrote:
Hello,

On 11/28/2009 09:12 PM, Avi Kivity wrote:
Hmm, commit 498657a moved the fire_sched_in_preempt_notifiers() call
into the irqs disabled section recently.

sched, kvm: Fix race condition involving sched_in_preempt_notifers

In finish_task_switch(), fire_sched_in_preempt_notifiers() is
called after finish_lock_switch().

However, depending on architecture, preemption can be enabled after
finish_lock_switch() which breaks the semantics of preempt
notifiers.

So move it before finish_arch_switch(). This also makes the in-
notifiers symmetric to out- notifiers in terms of locking - now
both are called under rq lock.

It's not a surprise that this breaks the existing code which does the
smp function call.
Yes, kvm expects preempt notifiers to be run with irqs enabled. Copying
patch author.
Hmmm... then, it's broken both ways. The previous code may get
preempted after scheduling but before the notifier is run (which
breaks the semantics of the callback horribly), the current code
doesn't satisfy kvm's requirement. Another thing is that in the
previous implementation the context is different between the 'in' and
'out' callbacks, which is subtle and nasty. Can kvm be converted to
not do smp calls directly?

No. kvm uses preempt notifiers to manage extended processor registers (much like the fpu). If we're scheduled into cpu A but state is currently live on cpu B, we need to go ahead and pull it in.

Technically, we can delay the IPI to happen after the sched in notifier; we can set some illegal state in cpu A and handle the exception by sending the IPI and fixing up the state. But that would be slower, and not help the problem at all since some accesses happen with interrupts disabled.

Since this is essentially the same problem as the fpu, maybe we can solve it the same way. How does the fpu migrate its state across processors? One way would be to save the state when the task is selected for migration.

For the time being, maybe it's best to back out the fix given that the
only architecture which may be affected by the original bug is ia64
which is the only one with both kvm and the unlocked context switch.

Agreed.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/