Re: smp_call_function_single lockups

From: Ingo Molnar
Date: Thu Apr 02 2015 - 15:07:37 EST

* Chris J Arges <chris.j.arges@xxxxxxxxxxxxx> wrote:

> Whenever we look through the crashdump we see csd_lock_wait waiting
> for CSD_FLAG_LOCK bit to be cleared. Usually the signature leading
> up to that looks like the following (in the openstack tempest on
> openstack and nested VM stress case)
> (qemu-system-x86 task)
> kvm_sched_in
> -> kvm_arch_vcpu_load
> -> vmx_vcpu_load
> -> loaded_vmcs_clear
> -> smp_call_function_single
> (ksmd task)
> pmdp_clear_flush
> -> flush_tlb_mm_range
> -> native_flush_tlb_others
> -> smp_call_function_many

So is this two separate smp_call_function instances, crossing each
other, and none makes any progress, indefinitely - as if the two IPIs
got lost?

The traces Rafael he linked to show a simpler scenario with two CPUs
apparently locked up, doing this:


#5 [ffffffff81c03e88] native_safe_halt at ffffffff81059386
#6 [ffffffff81c03e90] default_idle at ffffffff8101eaee
#7 [ffffffff81c03eb0] arch_cpu_idle at ffffffff8101f46f
#8 [ffffffff81c03ec0] cpu_startup_entry at ffffffff810b6563
#9 [ffffffff81c03f30] rest_init at ffffffff817a6067
#10 [ffffffff81c03f40] start_kernel at ffffffff81d4cfce
#11 [ffffffff81c03f80] x86_64_start_reservations at ffffffff81d4c4d7
#12 [ffffffff81c03f90] x86_64_start_kernel at ffffffff81d4c61c

This CPU is idle.


#10 [ffff88081993fa70] smp_call_function_single at ffffffff810f4d69
#11 [ffff88081993fb10] native_flush_tlb_others at ffffffff810671ae
#12 [ffff88081993fb40] flush_tlb_mm_range at ffffffff810672d4
#13 [ffff88081993fb80] pmdp_splitting_flush at ffffffff81065e0d
#14 [ffff88081993fba0] split_huge_page_to_list at ffffffff811ddd39
#15 [ffff88081993fc30] __split_huge_page_pmd at ffffffff811dec65
#16 [ffff88081993fcc0] unmap_single_vma at ffffffff811a4f03
#17 [ffff88081993fdc0] zap_page_range at ffffffff811a5d08
#18 [ffff88081993fe80] sys_madvise at ffffffff811b9775
#19 [ffff88081993ff80] system_call_fastpath at ffffffff817b8bad

This CPU is busy-waiting for the TLB flush IPI to finish.

There's no unexpected pattern here (other than it not finishing)
AFAICS, the smp_call_function_single() is just the usual way we invoke
the TLB flushing methods AFAICS.

So one possibility would be that an 'IPI was sent but lost'.

We could try the following trick: poll for completion for a couple of
seconds (since an IPI is not held up by anything but irqs-off
sections, it should arrive within microseconds typically - seconds of
polling should be more than enough), and if the IPI does not arrive,
print a warning message and re-send the IPI.

If the IPI was lost due to some race and there's no other failure mode
that we don't understand, then this would work around the bug and
would make the tests pass indefinitely - with occasional hickups and a
handful of messages produced along the way whenever it would have
locked up with a previous kernel.

If testing indeed confirms that kind of behavior we could drill down
more closely to figure out why the IPI did not get to its destination.

Or if the behavior is different, we'd have some new behavior to look
at. (for example the IPI sending mechanism might be wedged
indefinitely for some reason, so that even a resend won't work.)



To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at