Re: [PATCH Part2 v6 09/49] x86/fault: Add support to handle the RMP fault for user address

From: Michael Roth
Date: Tue Sep 06 2022 - 11:52:08 EST


On Tue, Sep 06, 2022 at 09:17:15AM -0500, Kalra, Ashish wrote:
> [AMD Official Use Only - General]
>
> >> On Tue, Aug 09, 2022 at 06:55:43PM +0200, Borislav Petkov wrote:
> >> > On Mon, Jun 20, 2022 at 11:03:43PM +0000, Ashish Kalra wrote:
> >> > > + pfn = pte_pfn(*pte);
> >> > > +
> >> > > + /* If its large page then calculte the fault pfn */
> >> > > + if (level > PG_LEVEL_4K) {
> >> > > + unsigned long mask;
> >> > > +
> >> > > + mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
> >> > > + pfn |= (address >> PAGE_SHIFT) & mask;
> >> >
> >> > Oh boy, this is unnecessarily complicated. Isn't this
> >> >
> >> > pfn |= pud_index(address);
> >> >
> >> > or
> >> > pfn |= pmd_index(address);
> >>
> >> I played with this a bit and ended up with
> >>
> >> pfn = pte_pfn(*pte) | PFN_DOWN(address & page_level_mask(level
> >> - 1));
> >>
> >> Unless I got something terribly wrong, this should do the same (see
> >> the attached patch) as the existing calculations.
>
> >Actually, I don't think they're the same. I think Jarkko's version is correct. Specifically:
> >- For level = PG_LEVEL_2M they're the same.
> >- For level = PG_LEVEL_1G:
> >The current code calculates a garbage mask:
> >mask = pages_per_hpage(level) - pages_per_hpage(level - 1); translates to:
> >>> hex(262144 - 512)
> >'0x3fe00'
>
> No actually this is not a garbage mask, as I explained in earlier responses we need to capture the address bits
> to get to the correct 4K index into the RMP table.
> Therefore, for level = PG_LEVEL_1G:
> mask = pages_per_hpage(level) - pages_per_hpage(level - 1) => 0x3fe00 (which is the correct mask).

That's the correct mask to grab the 2M-aligned address bits, e.g:

pfn_mask = 3fe00h = 11 1111 1110 0000 0000b

So the last 9 bits are ignored, e.g. anything PFNs that are multiples
of 512 (2^9), and the upper bits comes from the 1GB PTE entry.

But there is an open question of whether we actually want to index using
2M-aligned or specific 4K-aligned PFN indicated by the faulting address.

>
> >But I believe Jarkko's version calculates the correct mask (below), incorporating all 18 offset bits into the 1G page.
> >>> hex(262144 -1)
> >'0x3ffff'
>
> We can get this simply by doing (page_per_hpage(level)-1), but as I mentioned above this is not what we need.

If we actually want the 4K page, I think we would want to use the 0x3ffff
mask as Marc suggested to get to the specific 4K RMP entry, which I don't
think the current code is trying to do. But maybe that *should* be what we
should be doing.

Based on your earlier explanation, if we index into the RMP table using
2M-aligned address, we might find that the entry does not have the
page-size bit set (maybe it was PSMASH'd for some reason). If that's the
cause we'd then have to calculate the index for the specific RMP entry
for the specific 4K address that caused the fault, and then check that
instead.

If however we simply index directly in the 4K RMP entry from the start,
snp_lookup_rmpentry() should still tell us whether the page is private
or not, because RMPUPDATE/PSMASH are both documented to also update the
assigned bits for each 4K RMP entry even if you're using a 2M RMP entry
and setting the page-size bit to cover the whole 2M range.

Additionally, snp_lookup_rmpentry() already has logic to also go check
the 2M-aligned RMP entry to provide an indication of what level it is
mapped at in the RMP table, so we can still use that to determine if the
host mapping needs to be split or not.

One thing that could use some confirmation is what happens if you do an
RMPUPDATE for a 2MB RMP entry, and then go back and try to RMPUPDATE a
sub-page and change the assigned bit so it's not consistent with 2MB RMP
entry. I would assume that would fail the instruction, but we should
confirm that before relying on this logic.

-Mike

>
> Thanks,
> Ashish