Re: [PATCH] DMA-API: Change dma_declare_coherent_memory() CPU address to phys_addr_t

From: Arnd Bergmann
Date: Fri May 02 2014 - 04:42:56 EST


On Friday 02 May 2014 06:11:42 James Bottomley wrote:
> On Thu, 2014-05-01 at 13:05 -0600, Bjorn Helgaas wrote:
> > [+cc Arnd]
> >
> > On Thu, May 1, 2014 at 8:08 AM, James Bottomley
> > <jbottomley@xxxxxxxxxxxxx> wrote:
> > > On Wed, 2014-04-30 at 14:33 -0600, Bjorn Helgaas wrote:
> > >> dma_declare_coherent_memory() takes two addresses for a region of memory: a
> > >> "bus_addr" and a "device_addr". I think the intent is that "bus_addr" is
> > >> the physical address a *CPU* would use to access the region, and
> > >> "device_addr" is the bus address the *device* would use to address the
> > >> region.
> > >>
> > >> Rename "bus_addr" to "phys_addr" and change its type to phys_addr_t.
> > >
> > > Remind me what the difference between phys_addr_t and dma_addr_t are.
> > >
> > > I thought phys_addr_t was the maximum address the CPU could reach after
> > > address translation and dma_addr_t was the maximum physical address any
> > > bus attached memory mapped devices could appear at. (of course, mostly
> > > they're the same).
> >
> > I assumed phys_addr_t was for physical addresses generated by a CPU
> > and dma_addr_t was for addresses generated by a device, e.g.,
> > addresses you would see with a PCI bus analyzer or in a PCI BAR. But
> > ARCH_DMA_ADDR_T_64BIT does seem more connected to processor-specific
> > things, e.g., ARM_LPAE, than to device bus properties like "support
> > 64-bit PCI, not just 32-bit PCI."
>
> OK, but even in your definition dma_declare_coherent_memory() should
> operate on dma_addr_t because the input is a region of memory you got
> from the device. Somehow you generate and idea from the device
> configuration of where this piece of memory sits on the device and
> that's what you feed into dma_declare_coherent_memory(). It maps that
> region and makes it available to the CPU allocators.

I think it's ambiguous at best at the moment, there is no clear
answer what it should be until we define the semantics more clearly.

At the moment, we have these callers

arch/arm/mach-imx/mach-imx27_visstrim_m10.c: dma = dma_declare_coherent_memory(&pdev->dev,
arch/arm/mach-imx/mach-imx27_visstrim_m10.c: dma = dma_declare_coherent_memory(&pdev->dev,
arch/arm/mach-imx/mach-imx27_visstrim_m10.c: dma = dma_declare_coherent_memory(&pdev->dev,
arch/arm/mach-imx/mach-imx27_visstrim_m10.c: dma = dma_declare_coherent_memory(&pdev->dev,
arch/arm/mach-imx/mach-mx31_3ds.c: dma = dma_declare_coherent_memory(&pdev->dev,
arch/arm/mach-imx/mach-mx31moboard.c: dma = dma_declare_coherent_memory(&pdev->dev,
arch/arm/mach-imx/mach-mx35_3ds.c: dma = dma_declare_coherent_memory(&pdev->dev,
arch/arm/mach-imx/mach-pcm037.c: dma = dma_declare_coherent_memory(&pdev->dev,
arch/arm/plat-samsung/s5p-dev-mfc.c: if (dma_declare_coherent_memory(area->dev, area->base,
arch/sh/drivers/pci/fixups-dreamcast.c: BUG_ON(!dma_declare_coherent_memory(&dev->dev,
drivers/media/platform/s5p-mfc/s5p_mfc.c: if (dma_declare_coherent_memory(dev->mem_dev_l, mem_info[
drivers/media/platform/s5p-mfc/s5p_mfc.c: if (dma_declare_coherent_memory(dev->mem_dev_r, mem_info[
drivers/media/platform/soc_camera/sh_mobile_ceu_camera.c: err = dma_declare_coherent_memory
drivers/scsi/NCR_Q720.c: if (dma_declare_coherent_memory(dev, base_addr, base_addr,
drivers/usb/host/ohci-sm501.c: if (!dma_declare_coherent_memory(dev, mem->start,
drivers/usb/host/ohci-tmio.c: if (!dma_declare_coherent_memory(&dev->dev, sram->start,

I don't know about NCR_Q720, but all others are only used on machines
where physical addresses and bus addresses are in the same space.

There are other machines that may always have to go through an IOMMU,
or that have a fixed offset between dma addresses and physical addresses
and that need to call a phys_to_dma() to convert between the two, but
none of those call dma_declare_coherent_memory.

> > DMA-API-HOWTO.txt contains this:
> >
> > ... the definition of the dma_addr_t (which can hold any valid DMA
> > address for the platform) type which should be used everywhere you
> > hold a DMA (bus) address returned from the DMA mapping functions.
> >
> > and clearly you use a dma_addr_t from dma_alloc_coherent(), etc., to
> > program the device. It seems silly to have phys_addr_t and dma_addr_t
> > for two (possibly slightly different) flavors of CPU physical
> > addresses, and then sort of overload dma_addr_t by also using it to
> > hold the addresses devices use for DMA.
> >
> > I think the "CPU generates phys_addr_t" and "device generates
> > dma_addr_t" distinction seems like a useful idea. I'm not a CPU
> > person, so I don't understand the usefulness of dma_addr_t as a
> > separate type to hold CPU addresses at which memory-mapped devices can
> > appear. If that's all it is, why not just use phys_addr_t everywhere
> > and get rid of dma_addr_t altogether?
>
> Actually, on this one, I think I agree ... we really just need the idea
> of physical and virtual. The original idea of dma_addr_t is that the
> type would yell if you tried to use it where you could use a virtual
> address and the users were supposed to treat it as an opaque cookie for
> a region on the device, but I suppose it's generating more confusion
> than solving problems.

We also have machines on ARM can never do DMA to addresses above the
32-bit boundary but can have more than 4GB RAM (using swiotlb or iommu).
Building a kernel for these machines uses a 32-bit dma_addr_t and
a 64-bit phys_addr_t.

It's not clear if that distinction has any practical benefits, since
most configurations that enable 64-bit phys_addr_t also enable machines
that can do DMA to high addresses.

> > > > The intent of dma_declare_coherent_memory() is to take a range of memory
> > > provided by some device on the bus and allow the CPU to allocate regions
> > > from it for use as things like descriptors. The CPU treats it as real
> > > memory, but, in fact, it is a mapped region of an attached device.
> > >
> > > If my definition is correct, then bus_addr should be dma_addr_t because
> > > it has to be mapped from a device and dma_addr_t is the correct type for
> > > device addresses. If I've got the definition wrong, then we should
> > > document it somewhere, because it's probably confusing other people as
> > > well.
> >
> > dma_declare_coherent_memory() calls ioremap() on bus_addr, which seems
> > a clear indication that bus_addr is really a physical address usable
> > by the CPU. But I do see your point that bus_addr is a CPU address
> > for a memory-mapped device, and would thus fit into your idea of a
> > dma_addr_t.
>
> Well, yes, but it's declaring that the device has a region at bus_addr
> (on the device) we want to make available to the CPU to use. On the
> Q720 it's about a megabyte of sram behind a bridge. The CPU puts script
> fragments into it and the 720 scripts engines execute them. The reason
> for it being on the device behind a bridge is that fetching from CPU
> main memory causes too much bus traffic. I think all the other uses are
> something similar. The point, though, is that the address for the
> memory comes from the device. It has to be ioremapped because the
> system doesn't know about it and we're going to allocate from it for
> buffers mapped into the CPU virtual space.

I can see three different use cases:

* drivers/usb/host/ohci-*: SRAM, same as Q720
* s5p_mfc: a hack to guarantee that there are two DMA areas on
different memory banks, so the device can use them in parallel
for higher throughput.
* arch/arm/mach-imx: A workaround for limited amount of DMA-coherent
memory, which is traditionally limited to 2MB total on ARM.
Modern systems would use CMA for this, and imx could be converted
if someone wants to do the work.

There is also pci-dreamcast.c, but I'm not familiar with that, and
I can't tell from briefly reading the code.

> > I think this is the only part of the DMA API that deals with CPU
> > physical addresses. I suppose we could side-step this question by
> > having the caller do the ioremap; then dma_declare_coherent_memory()
> > could just take a CPU virtual address (a void *) and a device address
> > (a dma_addr_t).
>
> But we have to do the ioremap. Without that there's no mapping for the
> memory region we've just declared, so I think it does make sense to do
> it within the dma_declare_coherent_memory() API.

Doing ioremap() on memory may not be the ideal though, at least on ARM,
where subtle differences exist between doing an MT_DEVICE mapping
for MMIO and MT_UNCACHED for RAM that is meant for coherent access.

> > But I'd still have the question of whether we're using dma_addr_t
> > correctly in the PCI core. We use it to hold BAR values and the like,
> > and I've been making assumptions like "if dma_addr_t is only 32 bits,
> > we can't handle BARs above 4GB."
>
> Well, first we need to answer whether we need separate phys_addr_t and
> dma_addr_t ... we know the former represents physical memory addressed
> by the CPU and the latter bus physical addresses from devices ... but
> they're essentially the same.

I think BARs should be phys_addr_t: in the example with 32-bit dma_addr_t
and 64-bit phys_addr_t, you would be able to put the MMIO registers
into a high address if the PCI host supports that, the only limitation
would be that no device could DMA into a high address for doing inter-device
DMA.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/