Re: kvm: WARNING in mmu_spte_clear_track_bits

From: Dmitry Vyukov
Date: Fri Jan 13 2017 - 06:15:47 EST


On Tue, Dec 13, 2016 at 8:50 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> Hello,
>
> The following program:
> https://gist.githubusercontent.com/dvyukov/23d8bd622fd526d7701ac2057bbbc9c2/raw/aacd20451e6f460232f5e1da262b653fb3155613/gistfile1.txt
>
> leads to WARNING in mmu_spte_clear_track_bits and later to splash of
> BUG: Bad page state in process a.out pfn:619b5
>
> On commit e7aa8c2eb11ba69b1b69099c3c7bd6be3087b0ba (Dec 12).
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 6907 at mmu_spte_clear_track_bits+0x326/0x3a0
> arch/x86/kvm/mmu.c:614
> Modules linked in:
> CPU: 0 PID: 6907 Comm: a.out Not tainted 4.9.0+ #85
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> [< inline >] __dump_stack lib/dump_stack.c:15
> [< none >] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> [< none >] __warn+0x1a4/0x1e0 kernel/panic.c:550
> [< none >] warn_slowpath_null+0x31/0x40 kernel/panic.c:585
> [< none >] mmu_spte_clear_track_bits+0x326/0x3a0
> arch/x86/kvm/mmu.c:614
> [< none >] drop_spte+0x29/0x220 arch/x86/kvm/mmu.c:1182
> [< none >] mmu_page_zap_pte+0x209/0x300 arch/x86/kvm/mmu.c:2306
> [< inline >] kvm_mmu_page_unlink_children arch/x86/kvm/mmu.c:2328
> [< none >] kvm_mmu_prepare_zap_page+0x1cd/0x1240
> arch/x86/kvm/mmu.c:2372
> [< inline >] kvm_zap_obsolete_pages arch/x86/kvm/mmu.c:4915
> [< none >] kvm_mmu_invalidate_zap_all_pages+0x4af/0x6f0
> arch/x86/kvm/mmu.c:4956
> [< none >] kvm_arch_flush_shadow_all+0x1a/0x20
> arch/x86/kvm/x86.c:8177
> [< none >] kvm_mmu_notifier_release+0x76/0xb0
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:467
> [< none >] __mmu_notifier_release+0x1fe/0x6c0 mm/mmu_notifier.c:74
> [< inline >] mmu_notifier_release ./include/linux/mmu_notifier.h:235
> [< none >] exit_mmap+0x3d1/0x4a0 mm/mmap.c:2918
> [< inline >] __mmput kernel/fork.c:868
> [< none >] mmput+0x1fd/0x690 kernel/fork.c:890
> [< inline >] exit_mm kernel/exit.c:521
> [< none >] do_exit+0x9e7/0x2930 kernel/exit.c:826
> [< none >] do_group_exit+0x14e/0x420 kernel/exit.c:943
> [< inline >] SYSC_exit_group kernel/exit.c:954
> [< none >] SyS_exit_group+0x22/0x30 kernel/exit.c:952
> [< none >] entry_SYSCALL_64_fastpath+0x23/0xc6
> arch/x86/entry/entry_64.S:203
> RIP: 0033:0x43f4d9
> RSP: 002b:00007ffc7e83f548 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 00000000006d6660 RCX: 000000000043f4d9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 0000000000000001 R08: 000000000000003c R09: 00000000000000e7
> R10: ffffffffffffffd0 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007fe58e3869c0 R15: 00007fe58e386700
> ---[ end trace 37ef4e3d7e4c81a9 ]---
>
>
> BUG: Bad page state in process a.out pfn:61fb5
> page:ffffea000187ed40 count:0 mapcount:0 mapping: (null) index:0x0
> flags: 0x5fffc0000000014(referenced|dirty)
> raw: 05fffc0000000014 0000000000000000 0000000000000000 00000000ffffffff
> raw: 0000000000000000 ffffea000187ed60 0000000000000000 0000000000000000
> page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
> bad because of flags: 0x14(referenced|dirty)
> Modules linked in:
> CPU: 2 PID: 7169 Comm: a.out Tainted: G W 4.9.0+ #85
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> [< inline >] __dump_stack lib/dump_stack.c:15
> [< none >] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> [< none >] bad_page+0x29c/0x320 mm/page_alloc.c:550
> [< none >] check_new_page_bad+0x203/0x2f0 mm/page_alloc.c:1682
> [< inline >] check_new_page mm/page_alloc.c:1694
> [< inline >] check_new_pages mm/page_alloc.c:1731
> [< none >] buffered_rmqueue+0x1770/0x2900 mm/page_alloc.c:2668
> [< none >] get_page_from_freelist+0x213/0x1180
> mm/page_alloc.c:2985
> [< none >] __alloc_pages_nodemask+0x3b2/0xc90 mm/page_alloc.c:3801
> [< inline >] __alloc_pages ./include/linux/gfp.h:433
> [< inline >] __alloc_pages_node ./include/linux/gfp.h:446
> [< none >] alloc_pages_vma+0x723/0xa30 mm/mempolicy.c:2012
> [< none >] do_huge_pmd_anonymous_page+0x35f/0x1b10
> mm/huge_memory.c:704
> [< inline >] create_huge_pmd mm/memory.c:3476
> [< inline >] __handle_mm_fault mm/memory.c:3626
> [< none >] handle_mm_fault+0x1975/0x2b90 mm/memory.c:3687
> [< none >] __do_page_fault+0x4fb/0xb60 arch/x86/mm/fault.c:1396
> [< none >] trace_do_page_fault+0x159/0x810
> arch/x86/mm/fault.c:1489
> [< none >] do_async_page_fault+0x77/0xd0 arch/x86/kernel/kvm.c:264
> [< none >] async_page_fault+0x28/0x30
> arch/x86/entry/entry_64.S:1011
> RIP: 0033:0x401f5f
> RSP: 002b:00007fe592b8ece0 EFLAGS: 00010246
> RAX: 0000000020017fe0 RBX: 0000000000000000 RCX: 0000000000403894
> RDX: b93bc4d4f06f7d0e RSI: 0000000000000000 RDI: 00007fe592b8f608
> RBP: 00007fe592b8ed10 R08: 00007fe592b8f700 R09: 00007fe592b8f700
> R10: 00007fe592b8f9d0 R11: 0000000000000202 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007fe592b8f9c0 R15: 00007fe592b8f700
>
> BUG: Bad page state in process a.out pfn:619b5
> page:ffffea0001866d40 count:0 mapcount:0 mapping: (null) index:0x0
> flags: 0x5fffc0000000014(referenced|dirty)
> raw: 05fffc0000000014 0000000000000000 0000000000000000 00000000ffffffff
> raw: 0000000000000000 ffffea0001866d60 0000000000000000 0000000000000000
> page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
> bad because of flags: 0x14(referenced|dirty)
> Modules linked in:
> CPU: 2 PID: 7169 Comm: a.out Tainted: G B W 4.9.0+ #85
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> [< inline >] __dump_stack lib/dump_stack.c:15
> [< none >] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> [< none >] bad_page+0x29c/0x320 mm/page_alloc.c:550
> [< none >] check_new_page_bad+0x203/0x2f0 mm/page_alloc.c:1682
> [< inline >] check_new_page mm/page_alloc.c:1694
> [< inline >] check_new_pages mm/page_alloc.c:1731
> [< none >] buffered_rmqueue+0x1770/0x2900 mm/page_alloc.c:2668
> [< none >] get_page_from_freelist+0x213/0x1180
> mm/page_alloc.c:2985
> [< none >] __alloc_pages_nodemask+0x3b2/0xc90 mm/page_alloc.c:3801
> [< inline >] __alloc_pages ./include/linux/gfp.h:433
> [< inline >] __alloc_pages_node ./include/linux/gfp.h:446
> [< none >] alloc_pages_vma+0x723/0xa30 mm/mempolicy.c:2012
> [< none >] do_huge_pmd_anonymous_page+0x35f/0x1b10
> mm/huge_memory.c:704
> [< inline >] create_huge_pmd mm/memory.c:3476
> [< inline >] __handle_mm_fault mm/memory.c:3626
> [< none >] handle_mm_fault+0x1975/0x2b90 mm/memory.c:3687
> [< none >] __do_page_fault+0x4fb/0xb60 arch/x86/mm/fault.c:1396
> [< none >] trace_do_page_fault+0x159/0x810
> arch/x86/mm/fault.c:1489
> [< none >] do_async_page_fault+0x77/0xd0 arch/x86/kernel/kvm.c:264
> [< none >] async_page_fault+0x28/0x30
> arch/x86/entry/entry_64.S:1011
> RIP: 0033:0x401f5f
> RSP: 002b:00007fe592b8ece0 EFLAGS: 00010246
> RAX: 0000000020017fe0 RBX: 0000000000000000 RCX: 0000000000403894
> RDX: b93bc4d4f06f7d0e RSI: 0000000000000000 RDI: 00007fe592b8f608
> RBP: 00007fe592b8ed10 R08: 00007fe592b8f700 R09: 00007fe592b8f700
> R10: 00007fe592b8f9d0 R11: 0000000000000202 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007fe592b8f9c0 R15: 00007fe592b8f700


I've commented out the WARNING for now, but I am seeing lots of
use-after-free's and rcu stalls involving mmu_spte_clear_track_bits:


BUG: KASAN: use-after-free in mmu_spte_clear_track_bits+0x186/0x190
arch/x86/kvm/mmu.c:597 at addr ffff880068ae2008
Read of size 8 by task syz-executor2/16715
page:ffffea00016e6170 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x500000000000000()
raw: 0500000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: ffffea00017ec5a0 ffffea0001783d48 ffff88006aec5d98
page dumped because: kasan: bad access detected
CPU: 2 PID: 16715 Comm: syz-executor2 Not tainted 4.10.0-rc3+ #163
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x292/0x3a2 lib/dump_stack.c:51
kasan_report_error mm/kasan/report.c:213 [inline]
kasan_report+0x42d/0x460 mm/kasan/report.c:307
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:333
mmu_spte_clear_track_bits+0x186/0x190 arch/x86/kvm/mmu.c:597
drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1182
kvm_zap_rmapp+0x119/0x260 arch/x86/kvm/mmu.c:1401
kvm_unmap_rmapp+0x1d/0x30 arch/x86/kvm/mmu.c:1412
kvm_handle_hva_range+0x54a/0x7d0 arch/x86/kvm/mmu.c:1565
kvm_unmap_hva_range+0x2e/0x40 arch/x86/kvm/mmu.c:1591
kvm_mmu_notifier_invalidate_range_start+0xae/0x140
arch/x86/kvm/../../../virt/kvm/kvm_main.c:360
__mmu_notifier_invalidate_range_start+0x1f8/0x300 mm/mmu_notifier.c:199
mmu_notifier_invalidate_range_start include/linux/mmu_notifier.h:282 [inline]
unmap_vmas+0x14b/0x1b0 mm/memory.c:1368
unmap_region+0x2f8/0x560 mm/mmap.c:2460
do_munmap+0x7b8/0xfa0 mm/mmap.c:2657
mmap_region+0x68f/0x18e0 mm/mmap.c:1612
do_mmap+0x6a2/0xd40 mm/mmap.c:1450
do_mmap_pgoff include/linux/mm.h:2031 [inline]
vm_mmap_pgoff+0x1a9/0x200 mm/util.c:305
SYSC_mmap_pgoff mm/mmap.c:1500 [inline]
SyS_mmap_pgoff+0x22c/0x5d0 mm/mmap.c:1458
SYSC_mmap arch/x86/kernel/sys_x86_64.c:95 [inline]
SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:86
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445329
RSP: 002b:00007fb33933cb58 EFLAGS: 00000282 ORIG_RAX: 0000000000000009
RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 0000000000445329
RDX: 0000000000000003 RSI: 0000000000af1000 RDI: 0000000020000000
RBP: 00000000006dfe90 R08: ffffffffffffffff R09: 0000000000000000
R10: 0000000000000032 R11: 0000000000000282 R12: 0000000000700000
R13: 0000000000000006 R14: ffffffffffffffff R15: 0000000020001000
Memory state around the buggy address:
ffff880068ae1f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880068ae1f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880068ae2000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff880068ae2080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff880068ae2100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


INFO: rcu_sched self-detected stall on CPU
3-...: (62755 ticks this GP) idle=83f/140000000000001/0
softirq=314032/314032 fqs=14158
(t=65000 jiffies g=108568 c=108567 q=491)
Task dump for CPU 3:
syz-executor2 R running task 11768 19364 2894 0x0000000c
Call Trace:
<IRQ>
sched_show_task+0x3e4/0x540 kernel/sched/core.c:5217
dump_cpu_task+0x71/0x90 kernel/sched/core.c:8822
rcu_dump_cpu_stacks+0x318/0x35e kernel/rcu/tree.c:1290
print_cpu_stall+0x3ad/0x6f0 kernel/rcu/tree.c:1434
check_cpu_stall.isra.64+0x702/0xe80 kernel/rcu/tree.c:1502
__rcu_pending kernel/rcu/tree.c:3469 [inline]
rcu_pending kernel/rcu/tree.c:3533 [inline]
rcu_check_callbacks+0x27f/0xda0 kernel/rcu/tree.c:2867
update_process_times+0x30/0x60 kernel/time/timer.c:1612
tick_sched_handle.isra.18+0xb3/0xe0 kernel/time/tick-sched.c:151
tick_sched_timer+0x72/0x120 kernel/time/tick-sched.c:1158
__run_hrtimer kernel/time/hrtimer.c:1238 [inline]
__hrtimer_run_queues+0x385/0xf80 kernel/time/hrtimer.c:1302
hrtimer_interrupt+0x1c2/0x5e0 kernel/time/hrtimer.c:1336
local_apic_timer_interrupt+0x6f/0xe0 arch/x86/kernel/apic/apic.c:936
smp_apic_timer_interrupt+0x71/0xa0 arch/x86/kernel/apic/apic.c:960
apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:484
RIP: 0010:__write_once_size include/linux/compiler.h:272 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x52/0x60 kernel/kcov.c:96
RSP: 0018:ffff88006a8db870 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
RAX: 0000000000010000 RBX: 0000000000000000 RCX: ffffc9000186f000
RDX: 0000000000010000 RSI: ffffffff810fb9ba RDI: 0000000000000000
RBP: ffff88006a8db870 R08: 0000000000000001 R09: 0000000000000000
R10: a48dc350e42c518e R11: dffffc0000000000 R12: 1ffff1000d51b710
R13: ffff88006885a008 R14: ffff88006a8db8e0 R15: 0000000000000001
</IRQ>
is_shadow_present_pte arch/x86/kvm/mmu.c:310 [inline]
mmu_spte_clear_track_bits+0xba/0x190 arch/x86/kvm/mmu.c:604
drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1182
kvm_zap_rmapp+0x119/0x260 arch/x86/kvm/mmu.c:1401
kvm_unmap_rmapp+0x1d/0x30 arch/x86/kvm/mmu.c:1412
kvm_handle_hva_range+0x54a/0x7d0 arch/x86/kvm/mmu.c:1565
kvm_handle_hva arch/x86/kvm/mmu.c:1581 [inline]
kvm_unmap_hva+0x2a/0x30 arch/x86/kvm/mmu.c:1586
kvm_mmu_notifier_invalidate_page+0xa6/0x140
arch/x86/kvm/../../../virt/kvm/kvm_main.c:316
__mmu_notifier_invalidate_page+0x1ea/0x2f0 mm/mmu_notifier.c:185
mmu_notifier_invalidate_page include/linux/mmu_notifier.h:275 [inline]
try_to_unmap_one+0xd94/0x17c0 mm/rmap.c:1585
rmap_walk_file+0x446/0xb20 mm/rmap.c:1837
rmap_walk+0x1d8/0x300 mm/rmap.c:1857
try_to_unmap+0x57a/0x760 mm/rmap.c:1657
__unmap_and_move+0x464/0xf80 mm/migrate.c:1024
unmap_and_move mm/migrate.c:1113 [inline]
migrate_pages+0x958/0x2350 mm/migrate.c:1338
compact_zone+0xf49/0x24c0 mm/compaction.c:1556
compact_zone_order+0x3f7/0x700 mm/compaction.c:1663
try_to_compact_pages+0x347/0xc20 mm/compaction.c:1712
__alloc_pages_direct_compact+0x77/0x260 mm/page_alloc.c:3152
__alloc_pages_slowpath+0x9a7/0x1910 mm/page_alloc.c:3671
__alloc_pages_nodemask+0x8f4/0xbb0 mm/page_alloc.c:3810
alloc_pages_current+0x1c7/0x6b0 mm/mempolicy.c:2066
alloc_pages include/linux/gfp.h:462 [inline]
get_huge_zero_page mm/huge_memory.c:69 [inline]
mm_get_huge_zero_page+0x293/0x3e0 mm/huge_memory.c:103
do_huge_pmd_anonymous_page+0xac6/0x16b0 mm/huge_memory.c:677
create_huge_pmd mm/memory.c:3476 [inline]
__handle_mm_fault mm/memory.c:3628 [inline]
handle_mm_fault+0x1bc7/0x2630 mm/memory.c:3689
faultin_page mm/gup.c:391 [inline]
__get_user_pages+0x554/0x15f0 mm/gup.c:590
__get_user_pages_locked mm/gup.c:763 [inline]
get_user_pages+0xa3/0xd0 mm/gup.c:985
get_user_page_nowait arch/x86/kvm/../../../virt/kvm/kvm_main.c:1359 [inline]
hva_to_pfn_slow arch/x86/kvm/../../../virt/kvm/kvm_main.c:1420 [inline]
hva_to_pfn arch/x86/kvm/../../../virt/kvm/kvm_main.c:1537 [inline]
__gfn_to_pfn_memslot+0x3c6/0xff0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1593
try_async_pf+0x14b/0xd50 arch/x86/kvm/mmu.c:3512
tdp_page_fault+0x52c/0xc00 arch/x86/kvm/mmu.c:3577
kvm_mmu_page_fault+0xe7/0x270 arch/x86/kvm/mmu.c:4534
handle_ept_violation+0x18e/0x610 arch/x86/kvm/vmx.c:6412
vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625
vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline]
vcpu_run arch/x86/kvm/x86.c:6947 [inline]
kvm_arch_vcpu_ioctl_run+0xf3d/0x4660 arch/x86/kvm/x86.c:7105
kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2569
vfs_ioctl fs/ioctl.c:43 [inline]
do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:683
SYSC_ioctl fs/ioctl.c:698 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445329
RSP: 002b:00007f6b3ce26b58 EFLAGS: 00000286 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000015 RCX: 0000000000445329
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
RBP: 00000000006deb50 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000700000
R13: 00007f6b3d5a9768 R14: 00007f6b3d5b4d80 R15: 0000000020007000