Re: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA
From: Marc Zyngier
Date: Wed Jun 02 2021 - 05:38:04 EST
Hi Shanker,
On Sat, 08 May 2021 17:33:11 +0100,
Shanker R Donthineni <sdonthineni@xxxxxxxxxx> wrote:
>
> Hi Marc,
>
> On 5/5/21 1:02 PM, Catalin Marinas wrote:
> >>> Will/Catalin, perhaps you could explain your thought process on why you chose
> >>> Normal NC for ioremap_wc on the armv8 linux port instead of Device GRE or other
> >>> Device Gxx.
> >> I think a combination of: compatibility with 32-bit Arm, the need to
> >> support unaligned accesses and the potential for higher performance.
> > IIRC the _wc suffix also matches the pgprot_writecombine() used by some
> > drivers to map a video framebuffer into user space. Accesses to the
> > framebuffer are not guaranteed to be aligned (memset/memcpy don't ensure
> > alignment on arm64 and the user doesn't have a memset_io or memcpy_toio).
> >
> >> Furthermore, ioremap() already gives you a Device memory type, and we're
> >> tight on MAIR space.
> > We have MT_DEVICE_GRE currently reserved though no in-kernel user, we
> > might as well remove it.
> @Marc, Could you provide your thoughts/guidance for the next step? The
> proposal of getting hints for prefetchable regions from VFIO/QEMU is not
> recommended, The only option left is to implement ARM64 dependent logic
> in KVM.
>
> Option-1: I think we could take advantage of stage-1/2 combining rules to
> allow NORMAL_NC memory-type for device memory in VM. Always map
> device memory at stage-2 as NORMAL-NC and trust VM's stage-1 MT.
>
> ---------------------------------------------------------------
> Stage-2 MT Stage-1 MT Resultant MT (combining-rules/FWB)
> ---------------------------------------------------------------
> Normal-NC Normal-WT Normal-NC
> - Normal-WB -
> - Normal-NC -
> - Device-<attr> Device-<attr>
> ---------------------------------------------------------------
I think this is unwise.
Will recently debugged a pretty horrible situation when doing exactly
that: when S1 is off and S2 is on, the I-side is allowed to generate
speculative accesses (see ARMv8 ARM G.a D5.2.9 for the details). And
yes, implementations definitely do that. Add side-effect reads to the
mix, and you're in for a treat.
> We've been using this option internally for testing purpose and
> validated with NVME/Mellanox/GPU pass-through devices on
> Marvell-Thundex2 platform.
See above. It *will* break eventually.
> Option-2: Get resource properties associated with MMIO using lookup_resource()
> and map at stage-2 as Normal-NC if IORESOURCE_PREFETCH is set in flags.
That's a pretty roundabout way of doing exactly the same thing you
initially proposed. And it suffers from the exact same problems, which
is that you change the semantics of the mapping without knowing what
the guest's intent is.
M.
--
Without deviation from the norm, progress is not possible.