Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory

From: Jason Gunthorpe
Date: Wed Dec 06 2023 - 12:21:02 EST


On Wed, Dec 06, 2023 at 04:31:48PM +0000, Catalin Marinas wrote:

> > This would be fine, as would a VMA flag. Please pick one :)
> >
> > I think a VMA flag is simpler than messing with pgprot.
>
> I guess one could write a patch and see how it goes ;).

A lot of patches have been sent on this already :(

> > > If we want the VMM to drive this entirely, we could add a new mmap()
> > > flag like MAP_WRITECOMBINE or PROT_WRITECOMBINE. They do feel a bit
> >
> > As in the other thread, we cannot unconditionally map NORMAL_NC into
> > the VMM.
>
> I'm not suggesting this but rather the VMM map portions of the BAR with
> either Device or Normal-NC, concatenate them (MAP_FIXED) and pass this
> range as a memory slot (or multiple if a slot doesn't allow multiple
> vmas).

The VMM can't know what to do. We already talked about this. The VMM
cannot be involved in the decision to make pages NORMAL_NC or
not. That idea ignores how actual devices work.

Either the VM decides directly as this patch proposes or the VM does
some new generic trap/hypercall to ask the VMM to change it on its
behalf. The VMM cannot do it independently.

AFAIK nobody wants to see a trap/hypercall solution.

That is why we have been exclusively focused on this approach.

> > > The latter has some benefits for DPDK but it's a lot more involved
> > > with
> >
> > DPDK WC support will be solved with some VFIO-only change if anyone
> > ever cares to make it, if that is what you mean.
>
> Yeah. Some arguments I've heard in private and public discussions is
> that the KVM device pass-through shouldn't be different from the DPDK
> case.

I strongly disagree with this.

The KVM case should be solved without the VMM being aware of what
mappings the VM is doing.

DPDK is in control and can directly ask VFIO to make the correct
pgprot with an ioctl.

You can hear Alex also articulate this position in that video.

> There was some statement in there that for x86, the guests are
> allowed to do WC without other KVM restrictions (not sure whether
> that's the case, not familiar with it).

x86 has a similar issue (Sean was talking about this and how he wants
to fix it) where the VMM can restrict things and on x86 there are
configurations where WC does and doesn't work in VM's too. Depends on
who made the hypervisor. :(

Nobody has pushed hard enough to see it resolved in upstream, but I
understand some of the cloud operator set have their own solutions.

> > We talked about this already, the guest must decide, the VMM doesn't
> > have the information to pre-predict which pages the guest will want to
> > use WC on.
>
> Are the Device/Normal offsets within a BAR fixed, documented in e.g. the
> spec or this is something configurable via some MMIO that the guest
> does.

No, it is fully dynamic on demand with firmware RPCs.

Jason