Re: using DMA-API on ARM

From: Russell King - ARM Linux
Date: Fri Dec 05 2014 - 06:11:43 EST

Next message: Chuck Ebbert: "Re: [PATCH v2] all arches, signal: Move restart_block to struct task_struct"
Previous message: Kirill A. Shutemov: "Re: [RFC V2] mm:add KPF_ZERO_PAGE flag for /proc/kpageflags"
In reply to: Arnd Bergmann: "Re: using DMA-API on ARM"
Next in thread: Hante Meuleman: "RE: using DMA-API on ARM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Dec 05, 2014 at 10:52:02AM +0100, Arnd Bergmann wrote:
> I'm still puzzled why you'd need a single dma_sync_single_for_cpu()
> after dma_alloc_coherent though, you should not need any. Is it possible
> that the driver accidentally uses __raw_readl() instead of readl()
> in some places and you are just lacking an appropriate barrier?

Digging into the driver, it looks like individual DMA buffers are
allocated (via brcmf_pcie_init_dmabuffer_for_device) and registered
into a "commonring" layer.

Whenever the buffer is written to, space is first allocated via a call
to brcmf_commonring_reserve_for_write() or
brcmf_commonring_reserve_for_write_multiple(), data written to the
buffer, followed by a call to brcmf_commonring_write_complete().

brcmf_commonring_write_complete() calls two methods at that point:
cr_write_wptr() and cr_ring_bell(), which will be
brcmf_pcie_ring_mb_write_wptr() and brcmf_pcie_ring_mb_ring_bell().

The first calls brcmf_pcie_write_tcm16(), which uses iowrite16(),
which contains the appropriate barrier. The bell ringing functions
also use ioread*/iowrite*().

So, on the write side, it looks fine from the barrier perspective.

On the read side, brcmf_commonring_get_read_ptr() is used before
a read access to the ring - which calls the cr_update_wptr() method,
which in turn uses an ioread16() call. After the CPU has read data
from the ring, brcmf_commonring_read_complete() is used, which uses
iowrite16().

So, I don't see a barrier problem on the read side.

However, I did trip over this:

static void *
brcmf_pcie_init_dmabuffer_for_device(struct brcmf_pciedev_info *devinfo,
u32 size, u32 tcm_dma_phys_addr,
dma_addr_t *dma_handle)
{
void *ring;
long long address;

ring = dma_alloc_coherent(&devinfo->pdev->dev, size, dma_handle,
GFP_KERNEL);
if (!ring)
return NULL;

address = (long long)(long)*dma_handle;

Casting to (long) will truncate the DMA handle to 32-bits on a 32-bit
architecture, even if it supports 64-bit DMA addresses. There's a couple
of other places where this same truncation occurs:

address = (long long)(long)devinfo->shared.scratch_dmahandle;

and

address = (long long)(long)devinfo->shared.ringupd_dmahandle;

In any case, wouldn't using a u64 type for "address" be better - isn't
"long long" 128-bit on 64-bit architectures?

--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Chuck Ebbert: "Re: [PATCH v2] all arches, signal: Move restart_block to struct task_struct"
Previous message: Kirill A. Shutemov: "Re: [RFC V2] mm:add KPF_ZERO_PAGE flag for /proc/kpageflags"
In reply to: Arnd Bergmann: "Re: using DMA-API on ARM"
Next in thread: Hante Meuleman: "RE: using DMA-API on ARM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]