Re: kvm: GPF in kvm_lapic_latched_init

From: Jeff Merkey
Date: Fri Jan 15 2016 - 15:54:29 EST


On 1/15/16, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Fri, Jan 15, 2016 at 8:59 PM, Jeff Merkey <linux.mdb@xxxxxxxxx> wrote:
>> On 1/8/16, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>>> Hello,
>>>
>>> The following program triggers GPF in kvm_lapic_latched_init if run in
>>> a parallel loop:
>>> https://gist.githubusercontent.com/dvyukov/524b398f379440b21115/raw/9627095f57a72501fb51bf7565471d31732beeee/gistfile1.txt
>>>
>>> kasan: GPF could be caused by NULL-ptr deref or user memory
>>> accessgeneral protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>>> Modules linked in:
>>> CPU: 3 PID: 14426 Comm: a.out Not tainted 4.4.0-rc8+ #217
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>>> 01/01/2011
>>> task: ffff880061099780 ti: ffff880062e30000 task.ti: ffff880062e30000
>>> RIP: 0010:[<ffffffff81057171>] [<ffffffff81057171>]
>>> kvm_arch_vcpu_ioctl+0xa31/0x2ef0
>>> RSP: 0018:ffff880062e37900 EFLAGS: 00010206
>>> RAX: dffffc0000000000 RBX: 1ffff1000c5c6f25 RCX: 1ffff1000c41b7cb
>>> RDX: 000000000000001e RSI: 000000008040ae9f RDI: 00000000000000f0
>>> RBP: ffff880062e37c10 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>> R13: 0000000000000000 R14: ffff880062e37be8 R15: 0000000000000000
>>> FS: 00007f4aa815f700(0000) GS:ffff88006d700000(0000)
>>> knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> CR2: 00007f4aa795de78 CR3: 00000000613c2000 CR4: 00000000000026e0
>>> Stack:
>>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> 0000000020006fe4 0000000041b58ab3 ffffffff86e2e588 ffffffff81056740
>>> 0000000000000001 ffff880061099f60 0000000000000498 ffff880061099f68
>>> Call Trace:
>>> [<ffffffff8101cb52>] kvm_vcpu_ioctl+0x1e2/0xd00
>>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2526
>>> [< inline >] vfs_ioctl fs/ioctl.c:43
>>> [<ffffffff817b36b1>] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
>>> [< inline >] SYSC_ioctl fs/ioctl.c:622
>>> [<ffffffff817b3eff>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
>>> [<ffffffff85e745b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
>>> arch/x86/entry/entry_64.S:185
>>> Code: 85 2d 20 00 00 4d 8b a4 24 60 03 00 00 e8 c8 8b 50 00 49 8d bc
>>> 24 f0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>>> 3c 02 00 0f 85 f3 1f 00 00 4d 8b a4 24 f0 00 00 00 41 83 e4
>>> RIP [< inline >] constant_test_bit
>>> ./arch/x86/include/asm/bitops.h:311
>>> RIP [< inline >] kvm_lapic_latched_init
>>> arch/x86/kvm/lapic.h:164
>>> RIP [< inline >] kvm_vcpu_ioctl_x86_get_vcpu_events
>>> arch/x86/kvm/x86.c:2936
>>> RIP [<ffffffff81057171>] kvm_arch_vcpu_ioctl+0xa31/0x2ef0
>>> arch/x86/kvm/x86.c:3347
>>> RSP <ffff880062e37900>
>>> ---[ end trace 16449377928e034b ]---
>>>
>>>
>>> or:
>>>
>>> kasan: GPF could be caused by NULL-ptr deref or user memory
>>> accessgeneral protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>>> Modules linked in:
>>> CPU: 0 PID: 9555 Comm: syz-executor Not tainted 4.4.0-rc8+ #217
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>>> 01/01/2011
>>> task: ffff88006301de00 ti: ffff880062568000 task.ti: ffff880062568000
>>> RIP: 0010:[<ffffffff810cf5ab>] [<ffffffff810cf5ab>]
>>> wait_lapic_expire+0x6b/0x560
>>> RSP: 0018:ffff88006256fa48 EFLAGS: 00010006
>>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff88006301e5c8
>>> RDX: 0000000000000011 RSI: 0000000000000000 RDI: ffff880033590360
>>> RBP: ffff88006256fa88 R08: 0000000000000001 R09: 0000000000000002
>>> R10: 0000000000000001 R11: 0000000000000001 R12: ffff880033590000
>>> R13: ffff880033590030 R14: 0000000000000088 R15: ffff88003359002c
>>> FS: 00007f4809354700(0000) GS:ffff88003ec00000(0000)
>>> knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 00007f4808b53000 CR3: 0000000033f3f000 CR4: 00000000000026f0
>>> Stack:
>>> ffff88006256fa70 0000000000000082 0000000000000003 ffff88006301de00
>>> ffff880033590030 ffff880033590030 ffff880033590000 ffff88003359002c
>>> ffff88006256fc10 ffffffff8106a1dc ffffffff8106a75b 0000000000013210
>>> Call Trace:
>>> [< inline >] vcpu_enter_guest arch/x86/kvm/x86.c:6523
>>> [< inline >] vcpu_run arch/x86/kvm/x86.c:6660
>>> [<ffffffff8106a1dc>] kvm_arch_vcpu_ioctl_run+0x25ec/0x5820
>>> arch/x86/kvm/x86.c:6818
>>> [<ffffffff8101cf61>] kvm_vcpu_ioctl+0x5f1/0xd00
>>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2375
>>> [< inline >] vfs_ioctl fs/ioctl.c:43
>>> [<ffffffff817b36b1>] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
>>> [< inline >] SYSC_ioctl fs/ioctl.c:622
>>> [<ffffffff817b3eff>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
>>> [<ffffffff85e745b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
>>> arch/x86/entry/entry_64.S:185
>>> Code: 60 03 00 00 0f 1f 44 00 00 e8 92 07 49 00 4c 8d b3 88 00 00 00
>>> e8 86 07 49 00 4c 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80>
>>> 3c 02 00 0f 85 d8 04 00 00 4c 8b ab 88 00 00 00 4d 85 ed 75
>>> RIP [<ffffffff810cf5ab>] wait_lapic_expire+0x6b/0x560
>>> arch/x86/kvm/lapic.c:1245
>>> RSP <ffff88006256fa48>
>>> ---[ end trace 560c2b85e36670bc ]---
>>>
>>> or:
>>>
>>> kasan: GPF could be caused by NULL-ptr deref or user memory
>>> accessgeneral protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>>> Modules linked in:
>>> CPU: 3 PID: 11264 Comm: syz-executor Not tainted 4.4.0-rc8+ #217
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>>> 01/01/2011
>>> task: ffff880064d55e00 ti: ffff880064dc0000 task.ti: ffff880064dc0000
>>> RIP: 0010:[<ffffffff810d138d>] [<ffffffff810d138d>]
>>> apic_has_pending_timer+0x7d/0x210
>>> RSP: 0018:ffff880064dc7a60 EFLAGS: 00010206
>>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000004
>>> RDX: 0000000000000017 RSI: 0000000000000000 RDI: 00000000000000b8
>>> RBP: ffff880064dc7a70 R08: 0000000000000002 R09: 0000000000000001
>>> R10: ffff880064d55e00 R11: ffff880063528220 R12: ffff880063250030
>>> R13: ffff880063250030 R14: ffff880063250000 R15: 0000000000000000
>>> FS: 00007fb05f305700(0000) GS:ffff88006d700000(0000)
>>> knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 00000000006d7760 CR3: 0000000065ae9000 CR4: 00000000000026e0
>>> Stack:
>>> ffff880063250000 ffff880063250030 ffff880064dc7a88 ffffffff810c7af5
>>> ffffffff86fee5c0 ffff880064dc7c10 ffffffff810685d4 ffffffff8106a75b
>>> 0000000000013210 ffff880065a35000 1ffff1000c9b8f59 ffff880064dc0008
>>> Call Trace:
>>> [<ffffffff810c7af5>] kvm_cpu_has_pending_timer+0x15/0x20
>>> arch/x86/kvm/irq.c:36
>>> [< inline >] vcpu_run arch/x86/kvm/x86.c:6669
>>> [<ffffffff810685d4>] kvm_arch_vcpu_ioctl_run+0x9e4/0x5820
>>> arch/x86/kvm/x86.c:6818
>>> [<ffffffff8101cf61>] kvm_vcpu_ioctl+0x5f1/0xd00
>>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2375
>>> [< inline >] vfs_ioctl fs/ioctl.c:43
>>> [<ffffffff817b36b1>] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
>>> [< inline >] SYSC_ioctl fs/ioctl.c:622
>>> [<ffffffff817b3eff>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
>>> [<ffffffff85e745b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
>>> arch/x86/entry/entry_64.S:185
>>> Code: ba e9 48 00 0f 1f 44 00 00 e8 b0 e9 48 00 e8 ab e9 48 00 48 8d
>>> bb b8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>>> 3c 02 00 0f 85 46 01 00 00 4c 8b a3 b8 00 00 00 48 b8 00 00
>>> RIP [< inline >] arch_static_branch
>>> ./arch/x86/include/asm/jump_label.h:21
>>> RIP [< inline >] static_key_false
>>> include/linux/jump_label.h:133
>>> RIP [< inline >] kvm_apic_hw_enabled arch/x86/kvm/lapic.h:117
>>> RIP [< inline >] apic_enabled arch/x86/kvm/lapic.c:121
>>> RIP [<ffffffff810d138d>] apic_has_pending_timer+0x7d/0x210
>>> arch/x86/kvm/lapic.c:1731
>>> RSP <ffff880064dc7a60>
>>> ---[ end trace fe9c10b88e48c946 ]---
>>>
>>>
>>> All crashes suggest that apic is NULL.
>>>
>>> On commit b06f3a168cdcd80026276898fd1fee443ef25743 (Jan 6).
>>>
>>
>> Dmitry,
>>
>> You need to check your test harness and add checks for which CPL the
>> kernel is running at for these GPF faults and add that to your report.
>> I realize that there are a lot of kernel subsystems which are coded
>> very loose on checking for this stuff. I have looked through some of
>> these hangs you reported and I think one of them is related to a
>> swapgs instruction getting nested, and two others related to code
>> touching hardware.
>>
>> Can you figure out how to send the info as to what privilege level you
>> are at when these faults occur? This one looks like swapgs got nested
>> and gs was pointing off to oblivion.
>
>
> The program opens /dev/kvm under root because it is mounted as 700.
> But then do ioctl's under user nobody.
> Does it make sense to add UID to kernel BUG/WARNING (at least
> capable(CAP_SYS_ADMIN) flag)? Because it's a pretty generic concern
> for all crashes.
>
Well !!(cs & 3) is a good clue. According to CS (at least what this
says) we are at ring 0 on this one with gs not set.

Jeff