Re: [RFC 03/16] KVM: selftests: handle encryption bits in page tables
From: Michael Roth
Date: Mon Oct 25 2021 - 00:29:23 EST
On Thu, Oct 21, 2021 at 05:26:26PM +0200, Paolo Bonzini wrote:
> On 06/10/21 01:44, Michael Roth wrote:
> > SEV guests rely on an encyption bit which resides within the range that
> > current code treats as address bits. Guest code will expect these bits
> > to be set appropriately in their page tables, whereas helpers like
> > addr_gpa2hva() will expect these bits to be masked away prior to
> > translation. Add proper handling for these cases.
>
> This is not what you're doing below in addr_gpa2hva, though---or did I
> misunderstand?
The confusion is warranted, addr_gpa2hva() *doesn't* expect the C bit to
be masked in advance so the wording is pretty confusing.
I think I was referring the fact that internally it doesn't need/want the
C-bit, in this case it just masks it away as a convenience to callers,
as opposed to the other functions modified in the patch that actually
make use of it.
It's convenient because page table walkers/mappers make use of
addr_gpa2hva() to do things like silently mask away C-bits via when
translating PTEs to host addresses. We easily convert those callers from:
addr_gpa2hva(paddr)
to this:
addr_gpa2hva(addr_raw2gpa(paddr))
but now all new code needs to consider whether it might be dealing with
C-bits or not prior to deciding to pass it to addr_gpa2hva() (or not
really think about it, and add addr_gpa2raw() "just in case"). So since
it's always harmless to mask it away silently addr_gpa2hva(), the
logic/code seems to benefit a good deal if we indicate clearly that
addr_gpa2hva() can accept a 'raw' GPA, and will ignore it completely.
But not a big deal either way if you prefer to keep that explicit. And
commit message still needs to be clarified.
>
> I may be wrong due to not actually having written the code, but I'd prefer
> if most of these APIs worked only if the C bit has already been stripped.
> In general it's quite unlikely for host code to deal with C=1 pages, so it's
> worth pointing out explicitly the cases where it does.
I've tried to indicate functions that expect the C-bit by adding the 'raw_'
prefix to the gpa/paddr parameters, but as you pointed out with
addr_gpa2hva() it's already a bit inconsistent in that regard, and there's
a couple cases like virt_map() where I should use the 'raw_' prefix as well
that I've missed here.
So that should be addressed, and maybe some additional comments/assertions
might be warranted to guard against cases where the C-bit is passed in
unexpectedly.
But I should probably re-assess why the C-bit is being passed around in
the first place:
- vm_phy_page[s]_alloc() is the main 'source' for 'raw' GPAs with the
C-bit set. it determines this based on vm_memcrypt encryption policy,
and updates the encryption bitmask as well.
- vm_phy_page[s]_alloc() is callable both in kvm_util lib as well as
individual tests.
- in theory, encoding the C-bit in the returned vm_paddr_t means that
vm_phy_page[s]_alloc() callers can pass that directly into
virt_map/virt_pg_map() and this will "just work" for both
encrypted/non-encrypted guests.
- by masking it away in addr_gpa2hva(), existing tests/code flow mostly
"just works" as well.
But taking a closer look, in cases where vm_phy_page[s]_alloc() is called
directly by tests, like set_memory_region_test, emulator_error_test, and
smm_test, that raw GPA is compared to hardcoded non-raw GPAs, so they'd
still end up needing fixups to work with the proposed transparent-SEV-mode
stuff. And future code would need to be written to account for this, so
it doesn't really "just work" after all..
So it's worth considering the alternative approach of *not* encoding the
C-bit into GPAs returned by vm_phy_page[s]_alloc(). That would likely
involve introducing something like addr_gpa2raw(), which adds in the
C-bit according to the encryption bitmap as-needed. If we do that:
- virt_map()/virt_pg_map() still need to accept 'raw' GPAs, since they
need to deal with cases where pages are being mapping that weren't
allocated by vm_phy_page[s]_alloc(), and so aren't recorded in the
bitmap. in those cases it is up to test code to provide the C-bit
when needed (e.g. things like separate linear mappings for pa()-like
stuff in guest code).
- for cases where vm_phy_page[s]_alloc() determines whether the page
is encrypted, addr_gpa2raw() needs to be used to add back the C-bit
prior to passing it to virt_map()/virt_pg_map(), both in the library and
the test code. vm_vaddr_* allocations would handle all this under the
covers as they do now.
So test code would need to consider cases where addr_gpa2raw() needs to be
used to set the C-bit (which is basically only when they want to mix usage
of the vm_phy_page[s]_alloc with their own mapping of the guest page tables,
which doesn't seem to be done in any existing tests anyway).
The library code would need these addr_gpa2raw() hooks in places where
it calls virt_*map() internally. Probably just a handful of places
though.
Assuming there's no issues with this alternative approach that I may be
missing, I'll look at doing it this way for the next spin.
Even in this alternative approach though, having addr_gpa2hva() silently
mask away C-bit still seems useful for the reasons above, but again, no
strong feelings one way or the other on that.
>
> Paolo
>
> > @@ -1460,9 +1480,10 @@ void virt_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> > * address providing the memory to the vm physical address is returned.
> > * A TEST_ASSERT failure occurs if no region containing gpa exists.
> > */
> > -void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa)
> > +void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa_raw)
> > {
> > struct userspace_mem_region *region;
>