Re: Bug(?) in patch "arm64: Implement coherent DMA API based on swiotlb"

From: Catalin Marinas
Date: Wed Apr 02 2014 - 07:38:29 EST


On Wed, Apr 02, 2014 at 11:41:30AM +0100, Jon Medhurst (Tixy) wrote:
> On Wed, 2014-04-02 at 10:20 +0100, Catalin Marinas wrote:
> > On Wed, Apr 02, 2014 at 09:52:02AM +0100, Jon Medhurst (Tixy) wrote:
> > > But then that leaves a time period where a write can
> > > happen between the clean and the invalidate, again leading to data
> > > corruption. I hope all this means I've either got rather confused or
> > > that that cache architectures are smart enough to automatically cope.
> >
> > You are right. I think having unaligned DMA buffers for inbound
> > transfers is pointless. We can avoid losing data written by another CPU
> > in the same cache line but, depending on the stage of the DMA transfer,
> > it can corrupt the DMA data.
> >
> > I wonder whether it's easier to define the cache_line_size() macro to
> > read CWG
>
> That won't work, the stride of cache operations needs to be the
> _minimum_ cache line size, otherwise we might skip over some cache lines
> and not flush them. (We've been hit before by bugs caused by the fact
> that big.LITTLE systems report different minimum i-cache line sizes
> depend on whether you execute on the big or LITTLE cores [1], we need
> the 'real' minimum otherwise things go horribly wrong.)
>
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2013-February/149950.html

Yes, I remember this. CWG should also be the same in a big.LITTLE
system.

> > and assume that the DMA buffers are always aligned,
>
> We can't assume the region in any particular DMA transfer is cache
> aligned, but I agree, that if multiple actors were operating on adjacent
> memory locations in the same cache line, without implementing their own
> coordination then there's nothing the low level DMA code can do to avoid
> data corruption from cache cleaning.
>
> We at least need to make sure that the memory allocation functions used
> for DMA buffers return regions of whole CWG size, to avoid unrelated
> buffers corrupting each other. If I have correctly read
> __dma_alloc_noncoherent and the functions it calls, then it looks like
> buffers are actually whole pages, so that's not a problem.

It's not about dma_alloc but the streaming DMA API like dma_map_sg().

> > ignoring
> > the invalidation of the unaligned boundaries. This wouldn't be much
> > different from your scenario where the shared cache line is written
> > (just less likely to trigger but still a bug, so I would rather notice
> > this early).
> >
> > The ARMv7 code has a similar issue, it performs clean&invalidate on the
> > unaligned start but it doesn't move r0, so it goes into the main loop
> > invalidating the same cache line again.
>
> Yes, and as it's missing a dsb could also lead to the wrong behaviour if
> the invalidate was reordered to execute prior to the clean+invalidate on
> the same line. I just dug into git history to see if I could find a clue
> as to how the v7 code came to look like it does, but I see that it's
> been like that since the day it was submitted in 2007, by a certain
> Catalin Marinas ;-)

I don't remember ;). But there are some rules about reordering of cache
line operations by MVA with regards to memory accesses. I have to check
whether they apply to other d-cache maintenance to the same address as
well.

I'll try to come up with another patch using CWG.

--
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/