Re: [PATCH] x86: mm: Do not use set_{pud,pmd}_safe when splitting the large page

From: Peter Zijlstra
Date: Tue Apr 09 2019 - 04:40:42 EST


On Mon, Apr 08, 2019 at 07:11:21PM +0000, Singh, Brijesh wrote:
> The following commit 0a9fe8ca844d ("x86/mm: Validate kernel_physical_mapping_init()
> PTE population") triggers the below warning in the SEV guest.
>
> WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgalloc.h:87 phys_pmd_init+0x30d/0x386
> Call Trace:
> kernel_physical_mapping_init+0xce/0x259
> early_set_memory_enc_dec+0x10f/0x160
> kvm_smp_prepare_boot_cpu+0x71/0x9d
> start_kernel+0x1c9/0x50b
> secondary_startup_64+0xa4/0xb0
>
> The SEV guest calls kernel_physical_mapping_init() to clear the encryption
> mask from an existing mapping. While clearing the encryption mask
> kernel_physical_mapping_init() splits the large pages into the smaller.
> To split the page, the kernel_physical_mapping_init() allocates a new page
> and updates the existing entry. The set_{pud,pmd}_safe triggers warning
> when updating the entry with page in the present state. We should use the
> set_{pud,pmd} when updating an existing entry with the new entry.
>
> Updating an entry will also requires a TLB flush. Currently the caller
> (early_set_memory_enc_dec()) is taking care of issuing the TLB flushes.

I'm not entirely sure I like this, this means all users of
kernel_physical_mapping_init() now need to be aware and careful.

That said; the alternative is adding an argument to the function and
propagating it through the callchain and dynamically switching between
_safe and not. Which doesn't sound ideal either.

Anybody else got clever ideas?

> Signed-off-by: Brijesh Singh <brijesh.singh@xxxxxxx>
> Fixes: 0a9fe8ca844d (x86/mm: Validate kernel_physical_mapping_init() ...)
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Tom Lendacky <Thomas.Lendacky@xxxxxxx>
> ---
> arch/x86/mm/init_64.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index bccff68e3267..0a26b64a99b9 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -536,7 +536,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
> paddr_last = phys_pte_init(pte, paddr, paddr_end, new_prot);
>
> spin_lock(&init_mm.page_table_lock);
> - pmd_populate_kernel_safe(&init_mm, pmd, pte);
> + pmd_populate_kernel(&init_mm, pmd, pte);
> spin_unlock(&init_mm.page_table_lock);
> }
> update_page_count(PG_LEVEL_2M, pages);
> @@ -623,7 +623,7 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
> page_size_mask, prot);
>
> spin_lock(&init_mm.page_table_lock);
> - pud_populate_safe(&init_mm, pud, pmd);
> + pud_populate(&init_mm, pud, pmd);
> spin_unlock(&init_mm.page_table_lock);
> }
>
> --
> 2.17.1
>