Re: [PATCH] kvm/x86/mmu: use the correct inherited permissions to get shadow page

From: Lai Jiangshan
Date: Fri Nov 27 2020 - 21:06:25 EST


On Sat, Nov 28, 2020 at 12:48 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
> On 26/11/20 01:05, Sean Christopherson wrote:
> > On Fri, Nov 20, 2020, Lai Jiangshan wrote:
> >> From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> >>
> >> Commit 41074d07c78b ("KVM: MMU: Fix inherited permissions for emulated
> >> guest pte updates") said role.access is common access permissions for
> >> all ptes in this shadow page, which is the inherited permissions from
> >> the parent ptes.
> >>
> >> But the commit did not enforce this definition when kvm_mmu_get_page()
> >> is called in FNAME(fetch). Rather, it uses a random (last level pte's
> >> combined) access permissions.
> >
> > I wouldn't say it's random, the issue is specifically that all shadow pages end
> > up using the combined set of permissions of the entire walk, as opposed to the
> > only combined permissions of its parents.
> >
> >> And the permissions won't be checked again in next FNAME(fetch) since the
> >> spte is present. It might fail to meet guest's expectation when guest sets up
> >> spaghetti pagetables.
> >
> > Can you provide details on the exact failure scenario? It would be very helpful
> > for documentation and understanding. I can see how using the full combined
> > permissions will cause weirdness for upper level SPs in kvm_mmu_get_page(), but
> > I'm struggling to connect the dots to understand how that will cause incorrect
> > behavior for the guest. AFAICT, outside of the SP cache, KVM only consumes
> > role.access for the final/last SP.
> >
>
> Agreed, a unit test would be even better, but just a description in the
> commit message would be enough.
>
> Paolo
>

Something in my mind, but I haven't test it:

pgd[]pud[] pmd[] pte[] virtual address pointers
(same hpa as pmd2\) /->pte1(u--)->page1 <- ptr1 (u--)
/->pmd1(uw-)--->pte2(uw-)->page2 <- ptr2 (uw-)
pgd->pud-| (shared pte[] as above)
\->pmd2(u--)--->pte1(u--)->page1 <- ptr3 (u--)
(same hpa as pmd1/) \->pte2(uw-)->page2 <- ptr4 (u--)


pmd1 and pmd2 point to the same pte table, so:
ptr1 and ptr3 points to the same page.
ptr2 and ptr4 points to the same page.

The guess read-accesses to ptr1 first. So the hypervisor gets the
shadow pte page table with role.access=u-- among other things.
(Note the shadowed pmd1's access is uwx)

And then the guest write-accesses to ptr2, and the hypervisor
set up shadow page for ptr2.
(Note the hypervisor silencely accepts the role.access=u--
shadow pte page table in FNAME(fetch))

After that, the guess read-accesses to ptr3, the hypervisor
reused the same shadow pte page table as above.

At last, the guest writes to ptr4 without vmexit nor pagefault,
Which should cause vmexit as the guest expects.

In theory, guest userspace can trick the guest kernel if the guest
kernel sets up page table like this.

Such spaghetti pagetables are unlikely to be seen in the guest.

But when the guest is using KPTI and not using SMEP. KPTI means
all pgd entries are marked NX on the lower/userspace part of
the kernel pagetable. Which means SMEP is not needed.
(see arch/x86/mm/pti.c)

Assume the guest does disable SMEP and the guest has the flaw
that the guest user can trick guest kernel to execute on lower
part of the address space.

Normally, NX bit marked on the kernel pagetable's lower pgd
entries can help in this case. But when in guest with shadowpage
in hypervisor, the guest user can make those NX bit useless.

Again, I haven't tested it neither. I will try it later and
update the patch including adding some more checks in the mmu.c.

Thanks,
Lai