Re: [PATCH v4 2/3] common: DMA-mapping: add DMA_ATTR_NOHUGEPAGE attribute

From: Doug Anderson
Date: Fri Jan 08 2016 - 18:04:27 EST


Hi,

On Fri, Jan 8, 2016 at 5:10 AM, Robin Murphy <robin.murphy@xxxxxxx> wrote:
>> +DMA_ATTR_NOHUGEPAGE
>> +-------------------
>
>
> Bikeshed: DMA_ATTR_NO_HUGEPAGE (or even DMA_ATTR_NO_HUGE_PAGE) would be more
> consistent with the naming style of the other attributes.

Done. I'm running out a paint, so crossing my fingers that this is
the final color. ;)


>> +This is a hint to the DMA-mapping subsystem that it's probably not worth
>> +the time to try to allocate memory to in a way that gives better TLB
>> +efficiency (AKA it's not worth trying to build the mapping out of larger
>> +pages). You might want to specify this if:
>> +- You know that the accesses to this memory won't thrash the TLB.
>> + You might know that the accesses are likely to be sequential or
>> + that they aren't sequential but it's unlikely you'll ping-ping
>
>
> ^ping-pong?

Done.


>> + between many addresses that are likely to be in different physical
>> + pages.
>> +- You know that the penalty of TLB misses while accessing the
>> + memory will be small enough to be inconsequential. If you are
>> + doing a heavy operation like decryption or decompression this
>> + might be the case.
>> +- You know that the DMA mapping is fairly transitory. If you expect
>> + the mapping to have a short lifetime then it may be worth it to
>> + optimize allocation (avoid coming up with large pages) instead of
>> + getting the slight performance win of larger pages.
>> +Setting this hint doesn't guarantee that you won't get huge pages, but it
>> +means that we won't try quite as hard to get them.
>
>
> Nice detailed description, but I do worry it's a bit too ambiguous - it
> still parses perfectly well if you assume the references are to CPU TLBs and
> CPU accesses, rather than IOMMU TLBs and device accesses, especially given
> that the CPU is equally relevant to coherent DMA and there may not be an
> IOMMU at all. I assume that's not intentional, because otherwise it's also
> not quite accurate (I did once try to understand why we still have to split
> a CPU huge page for DMA even with a corresponding IOMMU huge page, but I
> remember getting completely lost somewhere in the bowels of the mm code).

Hmm. Well, the original ambiguity was sorta intentional.
Specifically anyone accessing this data through an MMU is likely to
have a TLB and allocating large chunks is more likely to increase the
efficiency of that TLB. If Linux today can't manage to take advantage
of these large chunks to optimize CPU TLB efficiency that's not really
something I think we need to take into account in the API. The API
should be OK even as Linux changes if possible...

If we happen to have no MMU at all between the DMA device and the
memory then presumably it need to be totally contiguous. That would
be OK. Presumably if the client knew that there was no MMU it would
specify DMA_ATTR_FORCE_CONTIGUOUS anyway... ...and if the DMA
subsubsystem wanted to make things contiguous despite the "no huge
page" hint that doesn't violate the hint--it is legal to ignore it.


I'm really just a visitor to the DMA subsystem, though. I would up
here down the rabbit hole of chasing down a bug. If I'm totally
misunderstanding something please correct me. ;)


-Doug