Re: smp_call_function_single lockups

From: Rafael David Tinoco
Date: Thu Feb 19 2015 - 10:42:46 EST


Linus, Peter, Thomas

Just a quick feedback, We were able to reproduce the lockup with this
proposed patch (3.19 + patch). Unfortunately we had problems with the
core file and I have only the stack trace for now but I think we are
able to reproduce it again and provide more details (sorry for the
delay... after a reboot it took some days for us to reproduce this
again).

It looks like RIP is still smp_call_function_single.

Same environment as before: Nested KVM (2 vcpus) on top of Proliant
DL380G8 with acpi_idle and no x2apic optout.

[47708.068013] CPU: 0 PID: 29869 Comm: qemu-system-x86 Tainted: G
E 3.19.0-c7671cf-lp1413540v2 #31
[47708.068013] Hardware name: OpenStack Foundation OpenStack Nova,
BIOS Bochs 01/01/2011
[47708.068013] task: ffff88081b9beca0 ti: ffff88081a7a0000 task.ti:
ffff88081a7a0000
[47708.068013] RIP: 0010:[<ffffffff810f537a>] [<ffffffff810f537a>]
smp_call_function_single+0xca/0x120
[47708.068013] RSP: 0018:ffff88081a7a3b38 EFLAGS: 00000202
[47708.068013] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000002
[47708.068013] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000296
[47708.068013] RBP: ffff88081a7a3b78 R08: ffffffff81815168 R09: ffff880818192000
[47708.068013] R10: 000000000000bdf6 R11: 000000000001bf90 R12: 00080000810b66f8
[47708.068013] R13: 00000000000000fb R14: 0000000000000296 R15: 0000000000000000
[47708.068013] FS: 00007fa143fff700(0000) GS:ffff88083fc00000(0000)
knlGS:0000000000000000
[47708.068013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[47708.068013] CR2: 00007f5d76f5d050 CR3: 00000008190cc000 CR4: 00000000000426f0
[47708.068013] Stack:
[47708.068013] ffff88083fd151b8 0000000000000001 0000000000000000
ffffffffc0589320
[47708.068013] ffff88081a547a80 0000000000000003 ffff88081a543f80
0000000000000000
[47708.068013] ffff88081a7a3b88 ffffffffc0586097 ffff88081a7a3bc8
ffffffffc058aefe
[47708.068013] Call Trace:
[47708.068013] [<ffffffffc0589320>] ?
copy_shadow_to_vmcs12+0x110/0x110 [kvm_intel]
[47708.068013] [<ffffffffc0586097>] loaded_vmcs_clear+0x27/0x30 [kvm_intel]
[47708.068013] [<ffffffffc058aefe>] vmx_vcpu_load+0x17e/0x1a0 [kvm_intel]
[47708.068013] [<ffffffff810a918d>] ? set_next_entity+0x9d/0xb0
[47708.068013] [<ffffffffc04660e3>] kvm_arch_vcpu_load+0x33/0x1f0 [kvm]
[47708.068013] [<ffffffffc0452529>] kvm_sched_in+0x39/0x40 [kvm]
[47708.068013] [<ffffffff8109e8e8>] finish_task_switch+0x98/0x1a0
[47708.068013] [<ffffffff817aa81b>] __schedule+0x33b/0x900
[47708.068013] [<ffffffff817aae17>] schedule+0x37/0x90
[47708.068013] [<ffffffffc0451e7d>] kvm_vcpu_block+0x6d/0xb0 [kvm]
[47708.068013] [<ffffffff810b6ec0>] ? prepare_to_wait_event+0x110/0x110
[47708.068013] [<ffffffffc0469d3c>] kvm_arch_vcpu_ioctl_run+0x10c/0x1290 [kvm]
[47708.068013] [<ffffffffc04551ce>] kvm_vcpu_ioctl+0x2ce/0x670 [kvm]
[47708.068013] [<ffffffff811ef441>] ? new_sync_write+0x81/0xb0
[47708.068013] [<ffffffff812034e8>] do_vfs_ioctl+0x2f8/0x510
[47708.068013] [<ffffffff811f2215>] ? __sb_end_write+0x35/0x70
[47708.068013] [<ffffffffc045cf84>] ? kvm_on_user_return+0x74/0x80 [kvm]
[47708.068013] [<ffffffff81203781>] SyS_ioctl+0x81/0xa0
[47708.068013] [<ffffffff817aefad>] system_call_fastpath+0x16/0x1b
[47708.068013] Code: 30 5b 41 5c 5d c3 0f 1f 00 48 8d 75 d0 48 89 d1
89 df 4c 89 e2 e8 57 fe ff ff 0f b7 55 e8 83 e2 01 74 da 66 0f 1f 44
00 00 f3 90 <0f> b7 55 e8 83 e2 01 75 f5 eb c7 0f 1f 00 8b 05 ca e6 dd
00 85
[47708.068013] Kernel panic - not syncing: softlockup: hung tasks
[47708.068013] CPU: 0 PID: 29869 Comm: qemu-system-x86 Tainted: G
EL 3.19.0-c7671cf-lp1413540v2 #31
[47708.068013] Hardware name: OpenStack Foundation OpenStack Nova,
BIOS Bochs 01/01/2011
[47708.068013] ffff88081b9beca0 ffff88083fc03de8 ffffffff817a6bf6
0000000000000000
[47708.068013] ffffffff81ab30d4 ffff88083fc03e68 ffffffff817a1aec
0000000000000e92
[47708.068013] 0000000000000008 ffff88083fc03e78 ffff88083fc03e18
ffff88083fc03e68
[47708.068013] Call Trace:
[47708.068013] <IRQ> [<ffffffff817a6bf6>] dump_stack+0x45/0x57
[47708.068013] [<ffffffff817a1aec>] panic+0xc1/0x1f5
[47708.068013] [<ffffffff8112ba0b>] watchdog_timer_fn+0x1db/0x1f0
[47708.068013] [<ffffffff810e0e37>] __run_hrtimer+0x77/0x1d0
[47708.068013] [<ffffffff8112b830>] ? watchdog+0x30/0x30
[47708.068013] [<ffffffff810e1203>] hrtimer_interrupt+0xf3/0x220
[47708.068013] [<ffffffffc0589320>] ?
copy_shadow_to_vmcs12+0x110/0x110 [kvm_intel]
[47708.068013] [<ffffffff8104b0a9>] local_apic_timer_interrupt+0x39/0x60
[47708.068013] [<ffffffff817b1fb5>] smp_apic_timer_interrupt+0x45/0x60
[47708.068013] [<ffffffff817b002d>] apic_timer_interrupt+0x6d/0x80
[47708.068013] <EOI> [<ffffffff810f537a>] ?
smp_call_function_single+0xca/0x120
[47708.068013] [<ffffffff810f5369>] ? smp_call_function_single+0xb9/0x120
[47708.068013] [<ffffffffc0589320>] ?
copy_shadow_to_vmcs12+0x110/0x110 [kvm_intel]
[47708.068013] [<ffffffffc0586097>] loaded_vmcs_clear+0x27/0x30 [kvm_intel]
[47708.068013] [<ffffffffc058aefe>] vmx_vcpu_load+0x17e/0x1a0 [kvm_intel]
[47708.068013] [<ffffffff810a918d>] ? set_next_entity+0x9d/0xb0
[47708.068013] [<ffffffffc04660e3>] kvm_arch_vcpu_load+0x33/0x1f0 [kvm]
[47708.068013] [<ffffffffc0452529>] kvm_sched_in+0x39/0x40 [kvm]
[47708.068013] [<ffffffff8109e8e8>] finish_task_switch+0x98/0x1a0
[47708.068013] [<ffffffff817aa81b>] __schedule+0x33b/0x900
[47708.068013] [<ffffffff817aae17>] schedule+0x37/0x90
[47708.068013] [<ffffffffc0451e7d>] kvm_vcpu_block+0x6d/0xb0 [kvm]
[47708.068013] [<ffffffff810b6ec0>] ? prepare_to_wait_event+0x110/0x110
[47708.068013] [<ffffffffc0469d3c>] kvm_arch_vcpu_ioctl_run+0x10c/0x1290 [kvm]
[47708.068013] [<ffffffffc04551ce>] kvm_vcpu_ioctl+0x2ce/0x670 [kvm]
[47708.068013] [<ffffffff811ef441>] ? new_sync_write+0x81/0xb0
[47708.068013] [<ffffffff812034e8>] do_vfs_ioctl+0x2f8/0x510
[47708.068013] [<ffffffff811f2215>] ? __sb_end_write+0x35/0x70
[47708.068013] [<ffffffffc045cf84>] ? kvm_on_user_return+0x74/0x80 [kvm]
[47708.068013] [<ffffffff81203781>] SyS_ioctl+0x81/0xa0
[47708.068013] [<ffffffff817aefad>] system_call_fastpath+0x16/0x1b

Tks
Rafael Tinoco

On Wed, Feb 18, 2015 at 8:25 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, Feb 11, 2015 at 12:42:10PM -0800, Linus Torvalds wrote:
>> Ok, this is a more involved patch than I'd like, but making the
>> *caller* do all the CSD maintenance actually cleans things up.
>>
>> And this is still completely untested, and may be entirely buggy. What
>> do you guys think?
>
> I think it makes perfect sense.
>
> Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/