Re: [RFC] arm: DMA-API contiguous cacheable memory

From: Lorenzo Nava
Date: Tue May 19 2015 - 18:28:01 EST


On Wed, May 20, 2015 at 12:14 AM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Wednesday 20 May 2015 00:05:54 Lorenzo Nava wrote:
>>
>> On Tue, May 19, 2015 at 6:34 PM, Catalin Marinas
>> <catalin.marinas@xxxxxxx> wrote:
>> > On Mon, May 18, 2015 at 10:56:06PM +0200, Lorenzo Nava wrote:
>> >> it's been a while since I've started working with DMA on ARM processor
>> >> for a smart camera project. Typically the requirements is to have a
>> >> large memory area which can be accessed by both DMA and user. I've
>> >> already noticed that many people wonder about which would be the best
>> >> way to have data received from DMA mapped in user space and, more
>> >> important, mapped in a cacheable area of memory. Having a memory
>> >> mapped region which is cacheable is very important if the user must
>> >> access the data and make some sort of processing on that.
>> >> My question is: why don't we introduce a function in the DMA-API
>> >> interface for ARM processors which allows to allocate a contiguous and
>> >> cacheable area of memory (> 4MB)?
>> >> This new function can take advantage of the CMA mechanism as
>> >> dma_alloc_coherent() function does, but using different PTE attribute
>> >> for the allocated pages. Basically making a function similar to
>> >> arm_dma_alloc() and set the attributes differently would do the trick:
>> >>
>> >> pgprot_t prot = __pgprot_modify(prot, L_PTE_MT_MASK,
>> >> L_PTE_MT_WRITEALLOC | L_PTE_XN)
>> >
>> > We already have a way to specify whether a device is coherent via the
>> > "dma-coherent" DT property. This allows the correct dma_map_ops to be
>> > set for a device. For cache coherent devices, the
>> > arm_coherent_dma_alloc() and __dma_alloc() should return cacheable
>> > memory.
>
> That is not what Lorenzo was asking about though.
>
>> > However, looking at the code, it seems that __dma_alloc() does not use
>> > the CMA when is_coherent == true, though you would hit a limit on the
>> > number of pages that can be allocated.
>> >
>> > As for mmap'ing to user space, there is arm_dma_mmap(). This one sets
>> > the vm_page_prot to what __get_dma_pgprot() returns which is always
>> > non-cacheable.
>> >
>> > I haven't checked the history cache coherent DMA support on arm but I
>> > think some of the above can be changed. As an example, on arm64
>> > __dma_alloc() allocates from CMA independent of whether the device is
>> > coherent or not. Also __get_dma_pgprot() returns cacheable attributes
>> > for coherent devices, which in turn allows cacheable user mapping of
>> > such buffers. You don't really need to implement additional functions,
>> > just tweaks to the existing ones.
>>
>> Thanks for the answer. I do agree with you on that: I'll take a look
>> at arm64 code and I'll be glad to contribute with patches as soon as
>> possible.
>>
>> Anyway I'd like to focus on a different aspect: I think that this
>> solution can manage cache coherent DMA, so devices which guarantees
>> the coherency using cache snooping mechanism. However how can I manage
>> devices which needs contiguous memory and don't guarantee cache
>> coherency? If the device doesn't implement sg functionality, I can't
>> allocate buffers which is greater than 4MB because I can't use neither
>> dma_alloc_coherent() nor accessing directly to CMA (well, actually I
>> can use dma_alloc_coherent(), but it sounds a little bit confusing).
>
> So you have a device that is not cache-coherent, and you want to
> allocate cacheable memory and manage coherency manually.
>
> This is normally done using alloc_pages() and dma_map_single(),
> but as you have realized, that does not use the CMA area.
>
>> Do you think that dma_alloc_coherent() can be used as well with this
>> type of devices? Do you think that a new dma_alloc_contiguous()
>> function would help in this case?
>> Maybe my interpretation of dma_alloc_coherent() is not correct, and
>> the coherency can be managed using the dma_sync_single_for_* functions
>> and it doesn't require hardware mechanism.
>
> I believe dma_alloc_attrs is the interface you want, with attributes
> DMA_ATTR_FORCE_CONTIGUOUS and DMA_ATTR_NON_CONSISTENT. I don't
> know if that is already implemented on arm64, but this is something
> that can definitely be done.
>
> With that memory, you should be able to use the normal streaming
> API (dma_sync_single_for_*). There is an older interface called
> dma_alloc_noncoherent(), but that cannot be easily implemented on
> ARM.
>
> Arnd

Yes, this is exactly the point. Currently this function is used only
with dma_alloc_coherent() function (which actually call
dma_alloc_attrs()).
This function, anyway, is not available in the DMA API of linux, but I
think it could be useful to manage some kind of devices (see my
previous mail).

What do you think would be the best way to access dma_alloc_attrs
function from a device driver? Call the function directly?

Thank you.
Lorenzo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/