Re: [PATCH] dma-mapping: fix page attributes for dma_mmap_*

From: Russell King - ARM Linux admin
Date: Tue Aug 06 2019 - 12:48:26 EST


On Tue, Aug 06, 2019 at 05:45:03PM +0100, Russell King - ARM Linux admin wrote:
> On Tue, Aug 06, 2019 at 05:08:54PM +0100, Will Deacon wrote:
> > On Sat, Aug 03, 2019 at 08:48:12AM +0200, Christoph Hellwig wrote:
> > > On Fri, Aug 02, 2019 at 11:38:03AM +0100, Will Deacon wrote:
> > > >
> > > > So this boils down to a terminology mismatch. The Arm architecture doesn't have
> > > > anything called "write combine", so in Linux we instead provide what the Arm
> > > > architecture calls "Normal non-cacheable" memory for pgprot_writecombine().
> > > > Amongst other things, this memory type permits speculation, unaligned accesses
> > > > and merging of writes. I found something in the architecture spec about
> > > > non-cachable memory, but it's written in Armglish[1].
> > > >
> > > > pgprot_noncached(), on the other hand, provides what the architecture calls
> > > > Strongly Ordered or Device-nGnRnE memory. This is intended for mapping MMIO
> > > > (i.e. PCI config space) and therefore forbids speculation, preserves access
> > > > size, requires strict alignment and also forces write responses to come from
> > > > the endpoint.
> > > >
> > > > I think the naming mismatch is historical, but on arm64 we wanted to use the
> > > > same names as arm32 so that any drivers using these things directly would get
> > > > the same behaviour.
> > >
> > > That all makes sense, but it totally needs a comment. I'll try to draft
> > > one based on this. I've also looked at the arm32 code a bit more, and
> > > it seems arm always (?) supported Normal non-cacheable attribute, but
> > > Linux only optionally uses it for arm v6+ because of fears of drivers
> > > missing barriers.
> >
> > I think it was also to do with aliasing, but I don't recall all of the
> > details.
>
> ARMv6+ is where the architecture significantly changed to introduce
> the idea of [Normal, Device, Strongly Ordered] where Normal has the
> cache attributes.
>
> Before that, we had just "uncached/unbuffered, uncached/buffered,
> cached/unbuffered, cached/buffered" modes.
>
> The write buffer (enabled by buffered modes) has no architected
> guarantees about how long writes will sit in it, and there is only
> the "drain write buffer" instruction to push writes out.
>
> Up to and including ARMv5, we took the easy approach of just using
> the "uncached/unbuffered" mode since that is (a) the safest, and (b)
> avoids write buffers that alias when there are multiple different
> mappings.
>
> We could have used a different approach, making all IO writes contain
> a "drain write buffer" instruction, and map DMA memory as "buffered",
> but as there were no Linux barriers defined to order memory accesses
> to DMA memory (so, for example, ring buffers can be updated in the
> correct order) back in those days, using the uncached/unbuffered mode
> was the sanest and most reliable solution.
>
> >
> > > The other really weird things is that in arm32
> > > pgprot_dmacoherent incudes the L_PTE_XN bit, which from my understanding
> > > is the no-execture bit, but pgprot_writecombine does not. This seems to
> > > not very unintentional. So minus that the whole DMA_ATTR_WRITE_COMBÐNE
> > > seems to be about flagging old arm specific drivers as having the proper
> > > barriers in places and otherwise is a no-op.
> >
> > I think it only matters for Armv7 CPUs, but yes, we should probably be
> > setting L_PTE_XN for both of these memory types.
>
> Conventionally, pgprot_writecombine() has only been used to change
> the memory type and not the permissions. Since writecombine memory
> is still capable of being executed, I don't see any reason to set XN
> for it.
>
> If the user wishes to mmap() using PROT_READ|PROT_EXEC, then is there
> really a reason for writecombine to set XN overriding the user?
>
> That said, pgprot_writecombine() is mostly used for framebuffers, which
> arguably shouldn't be executable anyway - but who'd want to mmap() the
> framebuffer with PROT_EXEC?
>
> >
> > > Here is my tentative plan:
> > >
> > > - respin this patch with a small fix to handle the
> > > DMA_ATTR_NON_CONSISTENT (as in ignore it unless actually supported),
> > > but keep the name as-is to avoid churn. This should allow 5.3
> > > inclusion and backports
> > > - remove DMA_ATTR_WRITE_COMBINE support from mips, probably also 5.3
> > > material.
> > > - move all architectures but arm over to just define
> > > pgprot_dmacoherent, including a comment with the above explanation
> > > for arm64.
> >
> > That would be great, thanks.
> >
> > > - make DMA_ATTR_WRITE_COMBINE a no-op and schedule it for removal,
> > > thus removing the last instances of arch_dma_mmap_pgprot
> >
> > All sounds good to me, although I suppose 32-bit Arm platforms without
> > CONFIG_ARM_DMA_MEM_BUFFERABLE may run into issues if DMA_ATTR_WRITE_COMBINE
> > disappears. Only one way to find out...
>
> Looking at the results of grep, I think only OMAP2+ and Exynos may be
> affected.
>
> However, removing writecombine support from the DMA API is going to
> have a huge impact for framebuffers on earlier ARMs - that's where we
> do expect framebuffers to be mapped "uncached/buffered" for performance
> reasons and not "uncached/unbuffered". It's quite literally the
> difference between console scrolling being usable and totally unusable.
>
> Given what I've said above, switching to using buffered mode for normal
> DMA mappings is data-corrupting risky - as in your filesystem could get
> fried. I don't think we should play fast and loose with people's data
> by randomly changing that "because we'd like to", and I don't see that
> screwing the console is really an option either.

Sorry, I forgot to explain - the reason is dma_alloc_writecombine()
internally uses DMA_ATTR_WRITE_COMBINE, which I'd forgotten about
when grepping - so there's potentially way more users than my greps
above found.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up