Re: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory

From: Jaewon Kim
Date: Mon Nov 27 2017 - 08:48:03 EST


Hello

2017-11-24 19:35 GMT+09:00 David Laight <David.Laight@xxxxxxxxxx>:
> From: Jaewon Kim
>> Sent: 24 November 2017 05:59
>>
>> dma-coherent uses bitmap APIs which internally consider align based on the
>> requested size. If most of allocations are small size like KBs, using
>> alignment scheme seems to be good for anti-fragmentation. But if large
>> allocation are commonly used, then an allocation could be failed because
>> of the alignment. To avoid the allocation failure, we had to increase total
>> size.
>>
>> This is a example, total size is 30MB, only few memory at front is being
>> used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The
>> first try on offset 0MB will be failed because others already are using
>> them. The second try on offset 16MB will be failed because of ouf of bound.
>>
>> So if the alignment is not necessary on a specific dma-coherent memory
>> region, we can set no-align property. Then dma-coherent will ignore the
>> alignment only for the memory region.
>
> ISTM that the alignment needs to be a property of the request, not of the
> device. Certainly the device driver code is most likely to know the specific
> alignment requirements of any specific allocation.
>
Sorry but I'm not fully understand on 'a property of the request'. Actually
dma-coherent APIs does not get alignment through argument but it internally
uses get_order to determine alignment according to a requested size.
I think if you meant that dma-coherent APIs should work in that way
because drivers
calling to dma-coherent APIs have been assuming the alignment for a long time.

I still think few memory region could be managed without alignment if author
knows well and adds no-align into its device tree. But it's OK if open
source community
worried about the no-alignment.

Thank you
> We've some hardware that would need large allocations to be 16k aligned.
> We actually use multiple 16k allocations because any large buffers are
> accessed directly from userspace (mmap and vm_iomap_memory) and the
> card has its own page tables (with 16k pages).
>
> David
>