Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags

From: Jason Gunthorpe
Date: Tue Mar 18 2025 - 19:12:13 EST


On Tue, Mar 18, 2025 at 12:30:43PM -0700, Oliver Upton wrote:
> On Tue, Mar 18, 2025 at 09:55:27AM -0300, Jason Gunthorpe wrote:
> > On Tue, Mar 18, 2025 at 09:39:30AM +0000, Marc Zyngier wrote:
> >
> > > The memslot must also be created with a new flag ((2c) in the taxonomy
> > > above) that carries the "Please map VM_PFNMAP VMAs as cacheable". This
> > > flag is only allowed if (1) is valid.
> > >
> > > This results in the following behaviours:
> > >
> > > - If the VMM creates the memslot with the cacheable attribute without
> > > (1) being advertised, we fail.
> > >
> > > - If the VMM creates the memslot without the cacheable attribute, we
> > > map as NC, as it is today.
> >
> > Is that OK though?
> >
> > Now we have the MM page tables mapping this memory as cachable but KVM
> > and the guest is accessing it as non-cached.
> >
> > I thought ARM tried hard to avoid creating such mismatches? This is
> > why the pgprot flags were used to drive this, not an opt-in flag. To
> > prevent userspace from forcing a mismatch.
>
> It's far more problematic the other way around, e.g. the host knows that
> something needs a Device-* attribute and the VM has done something
> cacheable. The endpoint for that PA could, for example, fall over when
> lines pulled in by the guest are written back, which of course can't
> always be traced back to the offending VM.
>
> OTOH, if the host knows that a PA is cacheable and the guest does
> something non-cacheable, you 'just' have to deal with the usual
> mismatched attributes problem as laid out in the ARM ARM.

I think the issue is that KVM doesn't do that usual stuff (ie cache
flushing) for memory that doesn't have a struct page backing.

So nothing in the hypervisor does any cache flushing and I belive you
end up with a situation where the VMM could have zero'd this cachable
memory using cachable stores to sanitize it across VMs and then KVM
can put that memory into the VM as uncached and the VM could then
access stale non-zeroed data from a prior VM. Yes? This is a security
problem.

As I understand things KVM must either do the cache flushing, or must
not allow mismatched attributes, as a matter of security.

This is why FWB comes into the picture because KVM cannot do the cache
flushing of PFNMAP VMAs. So we force the MM and KVM S2 to be cachable
and use S2 FWB to prevent the guest from ever making it
uncachable. Thus the cache flushes are not required and everything is
security safe.

So I think the logic we want here in the fault handler is to:
Get the mm's PTE
If it is cachable:
Check if it has a struct page:
Yes - KVM flushes it and can use a non-FWB path
No - KVM either fails to install it, or installs it using FWB
to force cachability. KVM never allows degrading cachable
to non-cachable when it can't do flushing.
Not cachable:
Install it with Normal-NC as was previously discussed and merged

> Userspace should be stating intentions on the memslot with the sort of
> mapping that it wants to create, and a memslot flag to say "I allow
> cacheable mappings" seems to fit the bill.

I'm not sure about this, I don't see that the userspace has any
choice. As above, KVM has to follow whatever is in the PTEs, the
userspace can't ask for something different here. At best you could
make non-struct page cachable memory always fail unless the flag is
given - but why?

It seems sufficient for fast fail to check if the VMA has PFNMAP and
pgprot cachable then !FEAT_FWB fails the memslot. There is no real
recovery from this, the VMM is doing something that cannot be
supported.

> - Stage-2 faults serviced w/ a non-cacheable mapping if flag is not
> set

As above, I think this creates a bug :\

Jason