Re: [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit
From: Brijesh Singh
Date: Mon Apr 12 2021 - 08:55:14 EST
On 4/12/21 6:49 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:24AM -0500, Brijesh Singh wrote:
>> @@ -161,3 +162,108 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
>> /* Ask hypervisor to make the memory shared in the RMP table. */
>> early_snp_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
>> }
>> +
>> +static int snp_page_state_vmgexit(struct ghcb *ghcb, struct snp_page_state_change *data)
> That function name definitely needs changing. The
> vmgexit_page_state_change() one too. They're currenty confusing as hell
> and I can't know what each one does without looking at its function
> body.
>
>> +{
>> + struct snp_page_state_header *hdr;
>> + int ret = 0;
>> +
>> + hdr = &data->header;
>> +
>> + /*
>> + * The hypervisor can return before processing all the entries, the loop below retries
>> + * until all the entries are processed.
>> + */
>> + while (hdr->cur_entry <= hdr->end_entry) {
> This doesn't make any sense: snp_set_page_state() builds a "set" of
> pages to change their state in a loop and this one iterates *again* over
> *something* which I'm not even clear on what.
>
> Is something setting cur_entry to end_entry eventually?
>
> In any case, why not issue those page state changes one-by-one in
> snp_set_page_state() or is it possible that HV can do a couple of
> them in one go so you have to poke it here until it sets cur_entry ==
> end_entry?
The cur_entry is updated by the hypervisor. While building the psc
buffer the guest sets the cur_entry=0 and the end_entry point to the
last valid entry. The cur_entry is incremented by the hypervisor after
it successfully processes one 4K page. As per the spec, the hypervisor
could get interrupted in middle of the page state change and cur_entry
allows the guest to resume the page state change from the point where it
was interrupted.
>
>> + ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
Since we can get interrupted while executing the PSC so just to be safe
I re-initialized the scratch scratch area with our buffer instead of
relying on old values.
> Why do you have to call that here for every loop iteration...
>
>> + ret = vmgexit_page_state_change(ghcb, data);
As per the spec the caller must check that the cur_entry > end_entry to
determine whether all the entries are processed. If not then retry the
state change. The hypervisor will skip the previously processed entries.
The snp_page_state_vmgexit() is implemented to return only after all the
entries are changed.
> .... and in that function too?!
>
>> + /* Page State Change VMGEXIT can pass error code through exit_info_2. */
>> + if (ret || ghcb->save.sw_exit_info_2)
>> + break;
>> + }
>> +
>> + return ret;
> You don't need that ret variable - just return value directly.
Noted.
>
>> +}
>> +
>> +static void snp_set_page_state(unsigned long paddr, unsigned int npages, int op)
>> +{
>> + unsigned long paddr_end, paddr_next;
>> + struct snp_page_state_change *data;
>> + struct snp_page_state_header *hdr;
>> + struct snp_page_state_entry *e;
>> + struct ghcb_state state;
>> + struct ghcb *ghcb;
>> + int ret, idx;
>> +
>> + paddr = paddr & PAGE_MASK;
>> + paddr_end = paddr + (npages << PAGE_SHIFT);
>> +
>> + ghcb = sev_es_get_ghcb(&state);
> That function can return NULL.
Ah good point. Will fix in next rev.
>
>> + data = (struct snp_page_state_change *)ghcb->shared_buffer;
>> + hdr = &data->header;
>> + e = &(data->entry[0]);
> So
>
> e = data->entry;
>
> ?
Sure I can do that. It reads better that way.
>> + memset(data, 0, sizeof (*data));
>> +
>> + for (idx = 0; paddr < paddr_end; paddr = paddr_next) {
> As before, a while loop pls.
Noted.
>
>> + int level = PG_LEVEL_4K;
> Why does this needs to happen on each loop iteration? It looks to me you
> wanna do below:
>
> e->pagesize = X86_RMP_PG_LEVEL(PG_LEVEL_4K);
>
> instead.
Noted. I will remove the local variable.
>> +
>> + /* If we cannot fit more request then issue VMGEXIT before going further. */
> any more requests
>
> No "we" pls.
Noted.
>
>> + if (hdr->end_entry == (SNP_PAGE_STATE_CHANGE_MAX_ENTRY - 1)) {
>> + ret = snp_page_state_vmgexit(ghcb, data);
>> + if (ret)
>> + goto e_fail;
> WARN
Based on your feedback on previous patches I am going to replace it with
WARN() followed by special termination code to indicate that guest fail
to change the page state.
>
>> +
>> + idx = 0;
>> + memset(data, 0, sizeof (*data));
>> + e = &(data->entry[0]);
>> + }
> The order of the operations in this function looks weird: what you
> should do is:
>
> - clear stuff, memset etc.
> - build shared buffer
> - issue vmgexit
>
> so that you don't have the memset and e reinit twice and the flow can
> be more clear and you don't have two snp_page_state_vmgexit() function
> calls when there's a trailing set of entries which haven't reached
> SNP_PAGE_STATE_CHANGE_MAX_ENTRY.
>
> Maybe a double-loop or so.
Yes with a double loop I can rearrange it a bit better. I will look into
it for the v2. thanks
> ...
>
>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>> index 16f878c26667..19ee18ddbc37 100644
>> --- a/arch/x86/mm/pat/set_memory.c
>> +++ b/arch/x86/mm/pat/set_memory.c
>> @@ -27,6 +27,8 @@
>> #include <asm/proto.h>
>> #include <asm/memtype.h>
>> #include <asm/set_memory.h>
>> +#include <asm/mem_encrypt.h>
>> +#include <asm/sev-snp.h>
>>
>> #include "../mm_internal.h"
>>
>> @@ -2001,8 +2003,25 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>> */
>> cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
>>
>> + /*
>> + * To maintain the security gurantees of SEV-SNP guest invalidate the memory before
>> + * clearing the encryption attribute.
>> + */
> Align that comment on 80 cols, just like the rest of the comments do in
> this file. Below too.
>
Noted.
>> + if (sev_snp_active() && !enc) {
> Push that sev_snp_active() inside the function. Below too.
Noted.
>
>> + ret = snp_set_memory_shared(addr, numpages);
>> + if (ret)
>> + return ret;
>> + }
>> +
>> ret = __change_page_attr_set_clr(&cpa, 1);
>>
>> + /*
>> + * Now that memory is mapped encrypted in the page table, validate the memory range before
>> + * we return from here.
>> + */
>> + if (!ret && sev_snp_active() && enc)
>> + ret = snp_set_memory_private(addr, numpages);
>> +
>> /*
>> * After changing the encryption attribute, we need to flush TLBs again
>> * in case any speculative TLB caching occurred (but no need to flush