Re: smp_call_function_single lockups

From: Chris J Arges
Date: Mon Apr 06 2015 - 13:24:17 EST

On Thu, Apr 02, 2015 at 10:31:50AM -0700, Linus Torvalds wrote:
> On Wed, Apr 1, 2015 at 2:59 PM, Chris J Arges
> <chris.j.arges@xxxxxxxxxxxxx> wrote:
> >
> > It is worthwhile to do a 'bisect' to see where on average it takes
> > longer to reproduce? Perhaps it will point to a relevant change, or it
> > may be completely useless.
> It's likely to be an exercise in futility. "git bisect" is realyl bad
> at "gray area" things, and when it's a question of "it takes hours or
> days to reproduce", it's almost certainly not worth it. Not unless
> there is some really clear cut-off that we can believably say "this
> causes it to get much slower". And in this case, I don't think it's
> that clear-cut. Judging by DaveJ's attempts at bisecting things, the
> timing just changes. And the differences might be due to entirely
> unrelated changes like cacheline alignment etc.
> So unless we find a real clear signature of the bug (I was hoping that
> the ISR bit would be that sign), I don't think trying to bisect it
> based on how quickly you can reproduce things is worthwhile.
> Linus

Linus, Ingo,

I did some testing and found that at the following patch level, the issue was
much, much more likely to reproduce within < 15m.

commit b6b8a1451fc40412c57d10c94b62e22acab28f94
Author: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
Date: Fri Mar 7 20:03:12 2014 +0100

KVM: nVMX: Rework interception of IRQs and NMIs

Move the check for leaving L2 on pending and intercepted IRQs or NMIs
from the *_allowed handler into a dedicated callback. Invoke this
callback at the relevant points before KVM checks if IRQs/NMIs can be
injected. The callback has the task to switch from L2 to L1 if needed
and inject the proper vmexit events.

The rework fixes L2 wakeups from HLT and provides the foundation for
preemption timer emulation.

However, when the following patch was applied the average time to reproduction
goes down greatly (the stress reproducer ran for hours without issue):

commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3
Author: Bandan Das <bsd@xxxxxxxxxx>
Date: Tue Jul 8 00:30:23 2014 -0400

KVM: x86: Check for nested events if there is an injectable interrupt

With commit b6b8a1451fc40412c57d1 that introduced
vmx_check_nested_events, checks for injectable interrupts happen
at different points in time for L1 and L2 that could potentially
cause a race. The regression occurs because KVM_REQ_EVENT is always
set when nested_run_pending is set even if there's no pending interrupt.
Consequently, there could be a small window when check_nested_events
returns without exiting to L1, but an interrupt comes through soon
after and it incorrectly, gets injected to L2 by inject_pending_event
Fix this by adding a call to check for nested events too when a check
for injectable interrupt returns true

However we reproduced with v3.19 (containing these two patches) which did
eventually softlockup with a similar backtrace.

So far, this agrees with the current understanding that we may be not ACK'ing
certain interrupts (IPIs from the L1 guest) causing csd_lock_wait to spin and
causing the soft lockup.

Hopefully this helps shed more light on this issue.

--chris j arges
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at