Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
From: Borislav Petkov
Date: Wed Mar 24 2021 - 17:22:44 EST
Ok,
some more experimenting Babu and I did lead us to:
---
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index f5ca15622dc9..259aa4889cad 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
*/
if (kaiser_enabled)
invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+ else
+ asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+
invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
}
applied on the guest kernel which fixes the issue. And let me add Hugh
who did that PCID stuff at the time. So lemme summarize for Hugh and to
ask him nicely to sanity-check me. :-)
Basically, you have an AMD host which supports PCID and INVPCID and you
boot on it a 4.9 guest. It explodes like the panic below.
What fixes it is this:
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index f5ca15622dc9..259aa4889cad 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
*/
if (kaiser_enabled)
invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+ else
+ asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+
invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
}
---
and the reason why it does, IMHO, is because on AMD, kaiser_enabled is
false because AMD is not affected by Meltdown, which means, there's no
user/kernel pagetables split.
And that also means, you have global TLB entries which means that if you
look at that __native_flush_tlb_single() function, it needs to flush
global TLB entries on CPUs with X86_FEATURE_INVPCID_SINGLE by doing an
INVLPG in the kaiser_enabled=0 case. Errgo, the above hunk.
But I might be completely off here thus this note...
Thoughts?
Thx.
[ 1.235726] ------------[ cut here ]------------
[ 1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
[ 1.240926] invalid opcode: 0000 [#1] SMP
[ 1.243301] Modules linked in:
[ 1.244585] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
[ 1.247657] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 1.251249] task: ffff909363e94040 task.stack: ffffa41bc0194000
[ 1.253519] RIP: 0010:[<ffffffff8fa2e40c>] [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
[ 1.256593] RSP: 0018:ffffa41bc0197d90 EFLAGS: 00010096
[ 1.258657] RAX: 000000000000000f RBX: 0000000001020800 RCX: 00000000feda3203
[ 1.261388] RDX: 00000000178bfbff RSI: 0000000000000000 RDI: ffffffffff57a000
[ 1.264168] RBP: ffffffff8fbd3eca R08: 0000000000000000 R09: 0000000000000003
[ 1.266983] R10: 0000000000000003 R11: 0000000000000112 R12: 0000000000000001
[ 1.269702] R13: ffffa41bc0197dcf R14: 0000000000000286 R15: ffffed1c40407500
[ 1.272572] FS: 0000000000000000(0000) GS:ffff909366300000(0000) knlGS:0000000000000000
[ 1.275791] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.278032] CR2: 0000000000000000 CR3: 0000000010c08000 CR4: 00000000003606f0
[ 1.280815] Stack:
[ 1.281630] ffffffff8fbd3eca 0000000000000005 ffffa41bc0197e03 ffffffff8fbd3ecb
[ 1.284660] 0000000000000000 0000000000000000 ffffffff8fa2e835 ccffffff8fad4326
[ 1.287729] 1ccd0231874d55d3 ffffffff8fbd3eca ffffa41bc0197e03 ffffffff90203844
[ 1.290852] Call Trace:
[ 1.291782] [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
[ 1.294900] [<ffffffff8fbd3ecb>] ? swap_entry_free+0x12b/0x300
[ 1.297267] [<ffffffff8fa2e835>] ? text_poke_bp+0x55/0xe0
[ 1.299473] [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
[ 1.301896] [<ffffffff8fa2b64c>] ? arch_jump_label_transform+0x9c/0x120
[ 1.304557] [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
[ 1.306790] [<ffffffff8fb81d92>] ? __jump_label_update+0x72/0x80
[ 1.309255] [<ffffffff8fb8206f>] ? static_key_slow_inc+0x8f/0xa0
[ 1.311680] [<ffffffff8fbd7a57>] ? frontswap_register_ops+0x107/0x1d0
[ 1.314281] [<ffffffff9077078c>] ? init_zswap+0x282/0x3f6
[ 1.316547] [<ffffffff9077050a>] ? init_frontswap+0x8c/0x8c
[ 1.318784] [<ffffffff8fa0223e>] ? do_one_initcall+0x4e/0x180
[ 1.321067] [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
[ 1.323366] [<ffffffff9073f08d>] ? kernel_init_freeable+0x16b/0x1ec
[ 1.325873] [<ffffffff90011d50>] ? rest_init+0x80/0x80
[ 1.327989] [<ffffffff90011d5a>] ? kernel_init+0xa/0x100
[ 1.330092] [<ffffffff9001f424>] ? ret_from_fork+0x44/0x70
[ 1.332311] Code: 00 0f a2 4d 85 e4 74 4a 0f b6 45 00 41 38 45 00 75 19 31 c0 83 c0 01 48 63 d0 49 39 d4 76 33 41 0f b6 4c 15 00 38 4c 15 00 74 e9 <0f> 0b 48 89 ef e8 da d6 19 00 48 8d bd 00 10 00 00 48 89 c3 e8
[ 1.342818] RIP [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
[ 1.345859] RSP <ffffa41bc0197d90>
[ 1.347285] ---[ end trace 0a1c5ab5eb16de89 ]---
[ 1.349169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 1.349169]
[ 1.352885] Kernel Offset: 0xea00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1.357039] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 1.357039]
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette