Re: [PATCH 2/3] kvm: x86/mmu: Ensure TDP MMU roots are freed after yield
From: Maciej S. Szmigiero
Date: Wed Jan 06 2021 - 04:27:10 EST
Thanks for looking at it Ben.
On 06.01.2021 00:38, Ben Gardon wrote:
(..)
>
> +Sean Christopherson, for whom I used a stale email address.
> .
> I tested this series by running kvm-unit-tests on an Intel Skylake
> machine. It did not introduce any new failures. I also ran the
> set_memory_region_test
It's "memslot_move_test" that is crashing the kernel - a memslot
move test based on "set_memory_region_test".
>, but was unable to reproduce Maciej's problem.
> Maciej, if you'd be willing to confirm this series solves the problem
> you observed, or provide more details on the setup in which you
> observed it, I'd appreciate it.
>
I've applied your patches and now are getting a slightly
different backtrace for the same test:
[ 534.768212] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP PTI
[ 534.887969] CPU: 97 PID: 4651 Comm: memslot_move_te Not tainted 5.11.0-rc2+ #81
[ 534.975465] Hardware name: Oracle Corporation ORACLE SERVER X7-2c/SERVER MODULE ASSY, , BIOS 46070300 12/20/2019
[ 535.097288] RIP: 0010:kvm_tdp_mmu_zap_gfn_range+0x70/0xb0 [kvm]
[ 535.168199] Code: b8 01 00 00 00 4c 89 f1 41 89 45 50 4c 89 ee 48 89 df e8 a3 f3 ff ff 41 09 c4 41 83 6d 50 01 74 13 4d 8b 6d 00 4d 39 fd 74 1e <41> 8b 45 50 85 c0 75 c6 0f 0b 4c 89 ee 48 89 df e8 0b fc ff ff 4d
[ 535.393005] RSP: 0018:ffffbded19083b90 EFLAGS: 00010297
[ 535.455533] RAX: 0000000000000001 RBX: ffffbded1a27d000 RCX: 000000008030000e
[ 535.540945] RDX: 000000008030000f RSI: ffffffffc0ad5453 RDI: ffff9cd72a00d300
[ 535.626358] RBP: ffffbded19083bc0 R08: 0000000000000001 R09: ffffffffc0ad5400
[ 535.711769] R10: ffff9d370acf31b8 R11: 0000000000000001 R12: 0000000000000001
[ 535.797181] R13: dead000000000100 R14: 0000000400000000 R15: ffffbded1a292418
[ 535.882590] FS: 00007ff50312e740(0000) GS:ffff9d947fb40000(0000) knlGS:0000000000000000
[ 535.979443] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 536.048211] CR2: 0000000001e02fe0 CR3: 00000060a78e8003 CR4: 00000000007726e0
[ 536.133628] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 536.219043] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 536.304452] PKRU: 55555554
[ 536.336813] Call Trace:
[ 536.366057] kvm_tdp_mmu_zap_all+0x26/0x40 [kvm]
[ 536.421357] kvm_mmu_zap_all_fast+0x167/0x180 [kvm]
[ 536.479767] kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10 [kvm]
[ 536.554817] kvm_page_track_flush_slot+0x5a/0x90 [kvm]
[ 536.616344] kvm_arch_flush_shadow_memslot+0xe/0x10 [kvm]
[ 536.680986] kvm_set_memslot+0x18f/0x690 [kvm]
[ 536.734186] __kvm_set_memory_region+0x41f/0x580 [kvm]
[ 536.795705] kvm_set_memory_region+0x2b/0x40 [kvm]
[ 536.853062] kvm_vm_ioctl+0x216/0x1060 [kvm]
[ 536.904182] ? irqtime_account_irq+0x40/0xc0
[ 536.955270] ? irq_exit_rcu+0x55/0xf0
[ 536.999079] ? sysvec_apic_timer_interrupt+0x45/0x90
[ 537.058485] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 537.122058] ? __audit_syscall_entry+0xdd/0x130
[ 537.176267] __x64_sys_ioctl+0x92/0xd0
[ 537.221114] do_syscall_64+0x37/0x50
[ 537.263878] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 537.324324] RIP: 0033:0x7ff502a27307
[ 537.367882] Code: 44 00 00 48 8b 05 69 1b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 1b 2d 00 f7 d8 64 89 01 48
[ 537.594221] RSP: 002b:00007fffde6b2d38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 537.685616] RAX: ffffffffffffffda RBX: 0000000001de8000 RCX: 00007ff502a27307
[ 537.771797] RDX: 0000000001e02fe0 RSI: 000000004020ae46 RDI: 0000000000000004
[ 537.857967] RBP: 00000000000001fc R08: 00007fffde74b090 R09: 000000000005af86
[ 537.944110] R10: 000000000005af86 R11: 0000000000000246 R12: 0000000050000000
[ 538.030236] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000001fb
[ 538.116345] Modules linked in: kvm_intel kvm xt_comment xt_owner ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad iw_cxgb4 rdma_cm iw_cm ib_cm intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 drm_kms_helper iTCO_wdt bnxt_re cec iTCO_vendor_support drm ib_uverbs syscopyarea sysfillrect ib_core sg irqbypass sysimgblt pcspkr ioatdma i2c_i801 fb_sys_fops joydev lpc_ich intel_pch_thermal i2c_smbus i2c_algo_bit dca ip_tables vfat fat xfs sd_mod t10_pi be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi
[ 538.116423] libcxgb qla4xxx iscsi_boot_sysfs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper bnxt_en wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [last unloaded: kvm]
[ 539.450863] ---[ end trace 7c17f445a2093145 ]---
[ 539.623473] RIP: 0010:kvm_tdp_mmu_zap_gfn_range+0x70/0xb0 [kvm]
[ 539.695136] Code: b8 01 00 00 00 4c 89 f1 41 89 45 50 4c 89 ee 48 89 df e8 a3 f3 ff ff 41 09 c4 41 83 6d 50 01 74 13 4d 8b 6d 00 4d 39 fd 74 1e <41> 8b 45 50 85 c0 75 c6 0f 0b 4c 89 ee 48 89 df e8 0b fc ff ff 4d
[ 539.921479] RSP: 0018:ffffbded19083b90 EFLAGS: 00010297
[ 539.984788] RAX: 0000000000000001 RBX: ffffbded1a27d000 RCX: 000000008030000e
[ 540.070982] RDX: 000000008030000f RSI: ffffffffc0ad5453 RDI: ffff9cd72a00d300
[ 540.157173] RBP: ffffbded19083bc0 R08: 0000000000000001 R09: ffffffffc0ad5400
[ 540.243372] R10: ffff9d370acf31b8 R11: 0000000000000001 R12: 0000000000000001
[ 540.329567] R13: dead000000000100 R14: 0000000400000000 R15: ffffbded1a292418
[ 540.415772] FS: 00007ff50312e740(0000) GS:ffff9d947fb40000(0000) knlGS:0000000000000000
[ 540.513427] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 540.583005] CR2: 0000000001e02fe0 CR3: 00000060a78e8003 CR4: 00000000007726e0
[ 540.669228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 540.755448] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 540.841659] PKRU: 55555554
[ 540.874826] Kernel panic - not syncing: Fatal exception
[ 540.938269] Kernel Offset: 0xe200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 542.097054] ---[ end Kernel panic - not syncing: Fatal exception ]---
The code that is crashing is:
# arch/x86/kvm/mmu/mmu_internal.h:100: BUG_ON(!sp->root_count);
movl 80(%r13), %eax # MEM[(int *)__mptr_14 + 80B], _17 <- here
testl %eax, %eax # _17
jne .L421 #,
So it looks like it now crashes in the same BUG_ON() but when trying to
deference the "dead" sp pointer instead.
It's bad that you can't reproduce the issue, however, as this would
probably make the root causing process much more effective.
Are you testing on bare metal like me or while running nested?
My test machine has Xeon Platinum 8167M CPU, so it's a Skylake, too.
It has 754G RAM + 8G swap, running just the test program.
I've uploaded the kernel that I've used for testing here:
https://github.com/maciejsszmigiero/linux/tree/tdp_mmu_bug
It is basically a 5.11.0-rc2 kernel with
"KVM: x86/mmu: Bug fixes and cleanups in get_mmio_spte()" series and
your fixes applied on top of it.
In addition to that, I've updated
https://gist.github.com/maciejsszmigiero/890218151c242d99f63ea0825334c6c0
with the kernel .config file that was used.
The compiler that I've used to compile the test kernel was:
"gcc version 8.3.1 20190311 (Red Hat 8.3.1-3.2.0.1)"
Thanks,
Maciej