Re: [PATCH Part2 v6 09/49] x86/fault: Add support to handle the RMP fault for user address

From: Borislav Petkov
Date: Tue Aug 09 2022 - 12:56:04 EST


On Mon, Jun 20, 2022 at 11:03:43PM +0000, Ashish Kalra wrote:
> From: Brijesh Singh <brijesh.singh@xxxxxxx>
>
> When SEV-SNP is enabled globally, a write from the host goes through the

globally?

Can SNP be even enabled any other way?

I see the APM talks about it being enabled globally, I guess this means
the RMP represents *all* system memory?

> @@ -1209,6 +1210,60 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
> }
> NOKPROBE_SYMBOL(do_kern_addr_fault);
>
> +static inline size_t pages_per_hpage(int level)
> +{
> + return page_level_size(level) / PAGE_SIZE;
> +}
> +
> +/*
> + * Return 1 if the caller need to retry, 0 if it the address need to be split
> + * in order to resolve the fault.
> + */

Magic numbers.

Pls do instead:

enum rmp_pf_ret {
RMP_PF_SPLIT = 0,
RMP_PF_RETRY = 1,
};

and use those instead.

> +static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
> + unsigned long address)
> +{
> + int rmp_level, level;
> + pte_t *pte;
> + u64 pfn;
> +
> + pte = lookup_address_in_mm(current->mm, address, &level);
> +
> + /*
> + * It can happen if there was a race between an unmap event and
> + * the RMP fault delivery.
> + */

You need to elaborate more here: a RMP fault can happen and then the
page can get unmapped? What is the exact scenario here?

> + if (!pte || !pte_present(*pte))
> + return 1;
> +
> + pfn = pte_pfn(*pte);
> +
> + /* If its large page then calculte the fault pfn */
> + if (level > PG_LEVEL_4K) {
> + unsigned long mask;
> +
> + mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
> + pfn |= (address >> PAGE_SHIFT) & mask;

Oh boy, this is unnecessarily complicated. Isn't this

pfn |= pud_index(address);

or
pfn |= pmd_index(address);

depending on the level?

I think it is but it needs more explaining.

In any case, those are two static masks exactly and they don't need to
be computed for each #PF.

> diff --git a/mm/memory.c b/mm/memory.c
> index 7274f2b52bca..c2187ffcbb8e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4945,6 +4945,15 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> return 0;
> }
>
> +static int handle_split_page_fault(struct vm_fault *vmf)
> +{
> + if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
> + return VM_FAULT_SIGBUS;

Yah, this looks weird: generic code implies that page splitting after a
#PF makes sense only when SEV is present and none otherwise.

Why?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette