Re: [RFC PATCH] vfio/pci: map prefetchble bars as writecombine

From: Srinath Mannam
Date: Wed Aug 01 2018 - 13:58:57 EST


Hi Alex,

In user space UIO driver (DPDK) implementation, sysfs interface
"/sys/devices/pci/.../resource0_wc" is used to map prefetchable PCI
resources as WC.
Platforms which support write-combining maps of PCI resources have
arch_can_pci_mmap_wc() flag enabled. So that it allows to map resources as WC.
In this approach mmap calls "pci_mmap_resource_range" kernel function
with write_combine parameter set.
"drivers/pci/pci-sysfs.c" kernel file has this implementation.

If this approach fits to vfio driver, then code change in vfio driver are

if (arch_can_pci_mmap_wc() &&
(pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH))
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
else
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

Please provide your feedback.

Thank you.

On Mon, Jul 23, 2018 at 2:03 PM, Srinath Mannam
<srinath.mannam@xxxxxxxxxxxx> wrote:
> On Sat, Jul 21, 2018 at 1:57 AM, Alex Williamson
> <alex.williamson@xxxxxxxxxx> wrote:
>> On Thu, 19 Jul 2018 21:49:48 +0530
>> Srinath Mannam <srinath.mannam@xxxxxxxxxxxx> wrote:
>>
>>> HI Alex,
>>>
>>> On Thu, Jul 19, 2018 at 8:42 PM, Alex Williamson
>>> <alex.williamson@xxxxxxxxxx> wrote:
>>> > On Thu, 19 Jul 2018 20:17:11 +0530
>>> > Srinath Mannam <srinath.mannam@xxxxxxxxxxxx> wrote:
>>> >
>>> >> HI Alex,
>>> >>
>>> >> On Thu, Jul 19, 2018 at 2:55 AM, Alex Williamson
>>> >> <alex.williamson@xxxxxxxxxx> wrote:
>>> >> > On Thu, 19 Jul 2018 00:05:18 +0530
>>> >> > Srinath Mannam <srinath.mannam@xxxxxxxxxxxx> wrote:
>>> >> >
>>> >> >> Hi Alex,
>>> >> >>
>>> >> >> On Tue, Jul 17, 2018 at 8:52 PM, Alex Williamson
>>> >> >> <alex.williamson@xxxxxxxxxx> wrote:
>>> >> >> > On Fri, 13 Jul 2018 10:26:17 +0530
>>> >> >> > Srinath Mannam <srinath.mannam@xxxxxxxxxxxx> wrote:
>>> >> >> >
>>> >> >> >> By default all BARs map with VMA access permissions
>>> >> >> >> as pgprot_noncached.
>>> >> >> >>
>>> >> >> >> In ARM64 pgprot_noncached is MT_DEVICE_nGnRnE which
>>> >> >> >> is strongly ordered and allows aligned access.
>>> >> >> >> This type of mapping works for NON-PREFETCHABLE bars
>>> >> >> >> containing EP controller registers.
>>> >> >> >> But it restricts PREFETCHABLE bars from doing
>>> >> >> >> unaligned access.
>>> >> >> >>
>>> >> >> >> In CMB NVMe drives PREFETCHABLE bars are required to
>>> >> >> >> map as MT_NORMAL_NC to do unaligned access.
>>> >> >> >>
>>> >> >> >> Signed-off-by: Srinath Mannam <srinath.mannam@xxxxxxxxxxxx>
>>> >> >> >> Reviewed-by: Ray Jui <ray.jui@xxxxxxxxxxxx>
>>> >> >> >> Reviewed-by: Vikram Prakash <vikram.prakash@xxxxxxxxxxxx>
>>> >> >> >> ---
>>> >> >> >
>>> >> >> > This has been discussed before:
>>> >> >> >
>>> >> >> > https://www.spinics.net/lists/kvm/msg156548.html
>>> >> >> Thank you for inputs.. I have gone through the long list of mail chain
>>> >> >> discussion.
>>> >> >> >
>>> >> >> > CC'ing the usual suspects from the previous thread. I'm not convinced
>>> >> >> > that the patch here has considered anything other than the ARM64
>>> >> >> > implications and it's not clear that it considers compatibility with
>>> >> >> > existing users or devices at all. Can we guarantee for all devices and
>>> >> >> > use cases that WC is semantically equivalent and preferable to UC? If
>>> >> >> > not then we need to device an extension to the interface that allows
>>> >> >> > the user to specify WC. Thanks,
>>> >> >> >
>>> >> >> To implement with user specified WC flags, many changes need to be done.
>>> >> >> Suppose In DPDK, prefetcable BARs map using WC flag, then also same
>>> >> >> question comes
>>> >> >> that WC may be different for different CPUs.
>>> >> >> As per functionality, both WC and PREFETCHABLE are same, like merging writes and
>>> >> >> typically WC is uncached.
>>> >> >> So, based on prefetchable BARs behavior and usage we need to map bar memory.
>>> >> >> Is it right to map prefetchable BARs as strongly ordered, aligned
>>> >> >> access and uncached?
>>> >> >
>>> >> > Is it possible to answer that question generically? Whether to map a
>>> >> > BAR as UC or WC is generally a question for the driver. Does the
>>> >> > device handle unaligned accesses? Does the device need strong memory
>>> >> > ordering? If this is a driver level question then the driver that
>>> >> > needs to make that decision is the userspace driver. VFIO is just a
>>> >> > pass-through here and since we don't offer the user a choice of
>>> >> > mappings, we take the safer and more conservative mapping, ie. UC.
>>> >> >
>>> >> Yes, you are right, driver should make the decision based on its requirement.
>>> >> In my case, user space driver is part of SPDK, so SPDK should request DPDK
>>> >> and DPDK should request VFIO to map BAR for its choice of mapping.
>>> >> So to implement this we need code changes in VFIO, DPDK and SPDK.
>>> >>
>>> >> > You're suggesting that there are many changes to be done if we modify
>>> >> > the vfio interface to expose WC under the user's control rather than
>>> >> > simply transparently impose WC for all mappings, but is that really the
>>> >> > case? Most devices on most platforms seem to work fine now. Perhaps WC
>>> >> > is a performance optimization, but this is the first instance I've seen
>>> >> > of it as a functional issue. Does that suggest that the imposed
>>> >> > alignment on your platform is perhaps unique and the relaxed alignment
>>> >> > should be implemented at the architecture specific memory flags for UC
>>> >> > mappings? For instance, does x86 require this change for the same
>>> >> > device? The chance for regressions of other devices on other platforms
>>> >> > seems rather high as proposed. Thanks,
>>> >> This issue is not specific to platform or device. this is the requirement of
>>> >> CMB enabled NVMe cards.
>>> >> NVMe kernel driver already has support to map CMB bar as WC using
>>> >> ioremap_wc function.
>>> >> File: drivers/nvme/host/pci.c
>>> >> Function: nvme_map_cmb
>>> >> code: dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size);
>>> >> It means ioremap_wc is working with all platforms and WC map of
>>> >> perfetchable BARs does not harm.
>>> >> Same is required in SPDK NVMe driver also, without WC map it may work
>>> >> in x86 platform, but it does not work in ARM platforms.
>>> >
>>> > Doesn't this contradict your assertion that it's not specific to
>>> > platform or device? The device requires support for unaligned
>>> > accesses. The platform chooses to restrict unaligned accesses for
>>> > non-WC mappings while other platforms do not. The native driver can
>>> > still clearly have performance considerations for choosing to use WC
>>> > mappings, but it's still not clear to me that the functionality issue
>>> > isn't self inflicted by the platform definition of UC vs WC. Thanks,
>>> >
>>> Device allows both aligned and unaligned access.. so software have
>>> flexibility can do unaligned access.
>>> As per ARM64 platform, with UC map, memory access should be un-cached,
>>> aligned access and strongly order mapping.
>>> with WC map, memory access can be aligned/unaligned and un-cached. I
>>> thought this is the property of platform not issue.
>>> To allow software to do unaligned access of device memory, we need to
>>> use WC map of ARM64 platform case.
>>> In ARM platforms UC mapping is used to map controller registers which
>>> are 4 byte aligned exposed by non-prefetchable bars.
>>> Also prefetchable BARs allows write merging so I thought using WC map
>>> fulfills both write merging (add performance) and unaligned
>>> access.
>>> Can I modify the code as below to enable only for ARM platforms.
>>> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>>> + if (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH)
>>> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
>>> +#endif
>>> vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
>>
>> While the risk of regression is smaller by restricting this to ARM, I
>> don't think it's the right solution. What happens when a device
>> requires strict ordering? ARM now behaves differently than any other
>> architecture, that's not acceptable. Thanks,
> If strict ordering is required for prefetchable bars, driver software
> has to add barrier instructions.
> With this we can assume for prefetchable bars WC mapping should be fine?
>
> Regards,
> Srinath.
>>
>> Alex