Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory

From: Jason Gunthorpe
Date: Tue Dec 05 2023 - 12:10:36 EST


On Tue, Dec 05, 2023 at 04:24:22PM +0000, Catalin Marinas wrote:
> On Tue, Dec 05, 2023 at 10:44:17AM -0400, Jason Gunthorpe wrote:
> > On Tue, Dec 05, 2023 at 03:37:13PM +0100, Lorenzo Pieralisi wrote:
> > > On Tue, Dec 05, 2023 at 09:05:17AM -0400, Jason Gunthorpe wrote:
> > > > On Tue, Dec 05, 2023 at 11:40:47AM +0000, Catalin Marinas wrote:
> > > > > > - Will had unanswered questions in another part of the thread:
> > > > > >
> > > > > > https://lore.kernel.org/all/20231013092954.GB13524@willie-the-truck/
> > > > > >
> > > > > > Can someone please help concluding it?
> > > > >
> > > > > Is this about reclaiming the device? I think we concluded that we can't
> > > > > generalise this beyond PCIe, though not sure there was any formal
> > > > > statement to that thread. The other point Will had was around stating
> > > > > in the commit message why we only relax this to Normal NC. I haven't
> > > > > checked the commit message yet, it needs careful reading ;).
> > > >
> > > > Not quite, we said reclaiming is VFIO's problem and if VFIO can't
> > > > reliably reclaim a device it shouldn't create it in the first place.
> > >
> > > I think that as far as device reclaiming was concerned the question
> > > posed was related to memory attributes of transactions for guest
> > > mappings and the related grouping/ordering with device reset MMIO
> > > transactions - it was not (or wasn't only) about error containment.
> >
> > Yes. It is VFIO that issues the reset, it is VFIO that must provide
> > the ordering under the assumption that NORMAL_NC was used.
>
> And does it? Because VFIO so far only assumes Device-nGnRnE. Do we need
> to address this first before attempting to change KVM? Sorry, just
> questions, trying to clear the roadblocks.

There is no way to know. It is SOC specific what would be needed.

Could people have implemented their platform devices with a multi-path
bus architecture for the reset? Yes, definately. In fact, I've built
things like that. Low speed stuff like reset gets its own low speed
bus.

If that was done will NORMAL_NC vs DEVICE_nGnRnE make a difference?
I'm not sure. It depends a lot on how the SOC was designed and how
transactions flow on the high speed side. Posting writes, like PCIe
does, would destroy any practical ordering difference between the two
memory types. If the writes are not posted then the barriers in the
TLBI sequence should order it.

Fortunately, if some SOC has this issue we know how to solve it - you
must do flushing reads on all the multi-path interconnect segments to
serialize everything around the reset.

Regardless, getting this wrong is not a functional issue, it causes a
subtle time sensitive security race with VFIO close() that would be
hard to actually hit, and would already require privilege to open a
VFIO device to exploit. IOW, we don't care.

Jason