Re: [PATCH] x86/tlb: Flush global mappings when KAISER is disabled

From: Babu Moger
Date: Thu Mar 25 2021 - 11:14:17 EST




On 3/25/21 5:29 AM, Borislav Petkov wrote:
> Ok,
>
> I tried to be as specific as possible in the commit message so that we
> don't forget. Please lemme know if I've missed something.
>
> Babu, Jim, I'd appreciate it if you ran this to confirm.
>
> Thx.
>
> ---
> From: Borislav Petkov <bp@xxxxxxx>
> Date: Thu, 25 Mar 2021 11:02:31 +0100
>
> Jim Mattson reported that Debian 9 guests using a 4.9-stable kernel
> are exploding during alternatives patching:
>
> kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
> invalid opcode: 0000 [#1] SMP
> Modules linked in:
> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> swap_entry_free
> swap_entry_free
> text_poke_bp
> swap_entry_free
> arch_jump_label_transform
> set_debug_rodata
> __jump_label_update
> static_key_slow_inc
> frontswap_register_ops
> init_zswap
> init_frontswap
> do_one_initcall
> set_debug_rodata
> kernel_init_freeable
> rest_init
> kernel_init
> ret_from_fork
>
> triggering the BUG_ON in text_poke() which verifies whether patched
> instruction bytes have actually landed at the destination.
>
> Further debugging showed that the TLB flush before that check is
> insufficient because there could be global mappings left in the TLB,
> leading to a stale mapping getting used.
>
> I say "global mappings" because the hardware configuration is a new one:
> machine is an AMD, which means, KAISER/PTI doesn't need to be enabled
> there, which also means there's no user/kernel pagetables split and
> therefore the TLB can have global mappings.
>
> And the configuration is new one for a second reason: because that AMD
> machine supports PCID and INVPCID, which leads the CPU detection code to
> set the synthetic X86_FEATURE_INVPCID_SINGLE flag.
>
> Now, __native_flush_tlb_single() does invalidate global mappings when
> X86_FEATURE_INVPCID_SINGLE is *not* set and returns.
>
> When X86_FEATURE_INVPCID_SINGLE is set, however, it invalidates the
> requested address from both PCIDs in the KAISER-enabled case. But if
> KAISER is not enabled and the machine has global mappings in the TLB,
> then those global mappings do not get invalidated, which would lead to
> the above mismatch from using a stale TLB entry.
>
> So make sure to flush those global mappings in the KAISER disabled case.
>
> Co-debugged by Babu Moger <babu.moger@xxxxxxx>.
>
> Reported-by: Jim Mattson <jmattson@xxxxxxxxxx>
> Signed-off-by: Borislav Petkov <bp@xxxxxxx>
> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2FCALMp9eRDSW66%252BXvbHVF4ohL7XhThoPoT0BrB0TcS0cgk%3DdkcBg%40mail.gmail.com&amp;data=04%7C01%7Cbabu.moger%40amd.com%7Cf4e0aacf81744dc8be4408d8ef78f2cf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637522650066097649%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1c4MQ9I9KrLxWLqghGCI%2BC%2Bvs0c9vYaNC5d%2FiYL0oMA%3D&amp;reserved=0
> ---
> arch/x86/include/asm/tlbflush.h | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index f5ca15622dc9..2bfa4deb8cae 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -245,12 +245,15 @@ static inline void __native_flush_tlb_single(unsigned long addr)
> * ASID. But, userspace flushes are probably much more
> * important performance-wise.
> *
> - * Make sure to do only a single invpcid when KAISER is
> - * disabled and we have only a single ASID.
> + * In the KAISER disabled case, do an INVLPG to make sure
> + * the mapping is flushed in case it is a global one.
> */
> - if (kaiser_enabled)
> + if (kaiser_enabled) {
> invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> - invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> + invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> + } else {
> + asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> + }
> }
>
> static inline void __flush_tlb_all(void)
>

Thanks Boris. As you updated the patch little bit since yesterday, I
retested them again both on host and guest kernel. They are looking good.

Tested-by: Babu Moger <babu.moger@xxxxxxx>