RE: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA

From: Vikram Sethi
Date: Mon May 03 2021 - 09:59:53 EST




> From: Mark Kettenis <mark.kettenis@xxxxxxxxx>
> > From: Marc Zyngier <maz@xxxxxxxxxx>

snip
> > If, by enumerating the properties of Prefetchable, you can show that
> > they are a strict superset of Normal_NC, I'm on board. I haven't seen
> > such an enumeration so far.
> >
snip
> > Right, so we have made a small step in the direction of mapping
> > "prefetchable" onto "Normal_NC", thanks for that. What about all the
> > other properties (unaligned accesses, ordering, gathering)?
>
Regarding gathering/write combining, that is also allowed to prefetchable per PCI spec
>From 1.3.2.2 of 5/0 base spec:
A PCI Express Endpoint requesting memory resources through a BAR must set the BAR's Prefetchable bit unless
the range contains locations with read side-effects or locations in which the Function does not tolerate write
merging.
Further 7.5.1.2.1 says " A Function is permitted
to mark a range as prefetchable if there are no side effects on reads, the Function returns all bytes on reads regardless of
the byte enables, and host bridges can merge processor writes into this range139 without causing errors"

The "regardless of byte enables" suggests to me that unaligned is OK, as only
certain byte enables may be set, what do you think?

So to me prefetchable in PCIe spec allows for write combining, read without
sideeffect (prefetch/speculative as long as uncached), and unaligned. Regarding
ordering I didn't find a statement one way or other in PCIe prefetchable definition, but
I think that goes beyond what PCIe says or doesn't say anyway since reordering can
also happen in the CPU, and since driver must be aware of correctness issues in its
producer/consumer models it will need the right barriers where they are required
for correctness anyway (required for the driver/userspace to work on host w/ ioremap_wc).

But perhaps the bigger question is since WC doesn't exist as a Memory type
on armv8, yet we are trying to fit something onto ioremap_wc which came from
x86 world, shouldn't the arm64 MT we use for WC match the semantics of
whatever drivers and userspace expected from ioremap_wc as defined on
x86, which as Mark notes below includes unaligned? If we agree to that,
we can codify it in the documentation of ioremap_wc and allow for
Normal NC on arm64 for ioremap_wc in host or guest.
Beyond that, if we don't want to do it automatically based on prefetchable
but from explicit call from userspace is fine too.

> On x86 WC:
>
> 1. Is not cached (but stores are buffered).
>
> 2. Allows unaligned access just like normal memory.
>
> 3. Allows speculative reads.
>
> 4. Has weaker ordering than normal memory; [lsm]fence instructions are
> needed to guarantee a particular ordering of writes with respect to
> other writes and reads.
>
> 5. Stores are buffered. This buffer isn't snooped so it has to be
> flushed before changes are globally visible. The [sm]fence
> instructions flush the store buffer.
>
> 6. The store buffer may combine multiple writes into a single write.
>
> Now whether the fact the unaligned access is allowed is really part of the
> semantics of WC mappings is debatable as x86 always allows unaligned
> access, even for areas mapped with ioremap().
>
> However, this is where userland comes in. The userland graphics stack does
> assume that graphics memory mapped throug a prefetchable PCIe BAR
> allows unaligned access if the architecture allows unaligned access for
> cacheable memory. On arm64 this means that such memory needs to be
> "Normal NC". And since kernel drivers tend to map such memory using
> ioremap_wc() that pretty much implies ioremap_wc() shoul use "Normal NC"
> as well isn't it?
>
> > > > How do we translate this into something consistent? I'd like to
> > > > see an actual description of what we *really* expect from WC on
> > > > prefetchable PCI regions, turn that into a documented definition
> > > > agreed across architectures, and then we can look at implementing
> > > > it with one memory type or another on arm64.
> > > >
> > > > Because once we expose that memory type at S2 for KVM guests, it
> > > > becomes ABI and there is no turning back. So I want to get it
> > > > right once and for all.
> > > >
> > > I agree that we need a precise definition for the Linux ioremap_wc
> > > API wrt what drivers (kernel and userspace) can expect and whether
> > > memset/memcpy is expected to work or not and whether aligned
> > > accesses are a requirement.
> > > To the extent ABI is set, I would think that the ABI is also already
> > > set in the host kernel for arm64 WC = Normal NC, so why should that
> > > not also be the ABI for same driver in VMs.
> >
> > KVM is an implementation of the ARM architecture, and doesn't really
> > care about what WC is. If we come to the conclusion that Normal_NC is
> > the natural match for Prefetchable attributes, than we're good and we
> > can have Normal_NC being set by userspace, or even VFIO. But I don't
> > want to set it only because "it works when bare-metal Linux uses it".
> > Remember KVM doesn't only run Linux as guests.
> >
> > M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >