Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

From: Andrew F. Davis
Date: Thu Jan 24 2019 - 11:05:12 EST


On 1/23/19 11:11 AM, Brian Starkey wrote:
> Hi Andrew,
>
> On Wed, Jan 23, 2019 at 10:51:24AM -0600, Andrew F. Davis wrote:
>> On 1/22/19 11:33 AM, Sumit Semwal wrote:
>>> Hello everyone,
>>>
>>> Sincere apologies for chiming in a bit late here, but was off due to
>>> some health issues.
>>>
>>
>> Hope you are feeling better friend :)
>>
>> Looks like this email was a bit broken and you replied again, the
>> responses are a little different in each email, so I'd like to respond
>> to bits of both, I'll fix up the formatting.
>>
>>> Also, adding Daniel Vetter to the mix, since he has been one of the
>>> core guys who shaped up dma-buf as it is today.
>>>
>>> On Tue, 22 Jan 2019 at 02:51, Andrew F. Davis <afd@xxxxxx> wrote:
>>>>
>>>> On 1/21/19 5:22 AM, Brian Starkey wrote:
>>
>> [snip]
>>
>>>>>
>>>>> Actually I meant in the kernel, in exporters. I haven't seen anyone
>>>>> using the API as it was intended (defer allocation until first map,
>>>>> migrate between different attachments, etc.). Mostly, backing storage
>>>>> seems to get allocated at the point of export, and device mappings are
>>>>> often held persistently (e.g. the DRM prime code maps the buffer at
>>>>> import time, and keeps it mapped: drm_gem_prime_import_dev).
>>>>>
>>>>
>>>
>>> So I suppose some clarification on the 'intended use' part of dma-buf
>>> about deferred allocation is due, so here it is: (Daniel, please feel
>>> free to chime in with your opinion here)
>>>
>>> - dma-buf was of course designed as a framework to help intelligent
>>> exporters to defer allocation until first map, and be able to migrate
>>> backing storage if required etc. At the same time, it is not a
>>> _requirement_ from any exporter, so exporters so far have just used it
>>> as a convenient mechanism for zero-copy.
>>> - ION is one of the few dma-buf exporters in kernel, which satisfies a
>>> certain set of expectations from its users.
>>>
>>
>> The issue here is that Ion is blocking the ability to late allocate, it
>> expects its heaps to have the memory ready at allocation time. My point
>> being if the DMA-BUFs intended design was to allow this then Ion should
>> respect that and also allow the same from its heap exporters.
>>
>>>> I haven't either, which is a shame as it allows for some really useful
>>>> management strategies for shared memory resources. I'm working on one
>>>> such case right now, maybe I'll get to be the first to upstream one :)
>>>>
>>> That will be a really good thing! Though perhaps we ought to think if
>>> for what you're trying to do, is ION the right place, or should you
>>> have a device-specific exporter, available to users via dma-buf apis?
>>>
>>
>> I'm starting to question if Ion is the right place myself..
>>
>> At a conceptual level I don't believe userspace should be picking the
>> backing memory type. This is because the right type of backing memory
>> for a task will change from system to system. The kernel should abstract
>> away these hardware differences from userspace as much as it can to
>> allow portable code.
>>
>> For instance a device may need a contiguous buffer on one system but the
>> same device on another may have some IOMMU. So which type of memory do
>> we allocate? Same issue for cacheability and other properties.
>>
>> What we need is a central allocator with full system knowledge to do the
>> choosing for us. It seems many agree with the above and I take
>> inspiration from your cenalloc patchset. The thing I'm not sure about is
>> letting the device drivers set their constraints, because they also
>> don't have the full system integration details. For cases where devices
>> are behind an IOMMU it is easy enough for the device to know, but what
>> about when we have external MMUs out on the bus for anyone to use (I'm
>> guessing you remember TILER..).
>>
>> I would propose the central allocator keep per-system knowledge (or
>> fetch it from DT, or if this is considered policy then userspace) which
>> it can use to directly check the attached devices and pick the right memory.
>>
>> Anyway the central system allocator could handle 90% of cases I can
>> think of, and this is where Ion comes back in, the other cases would
>> still require the program to manually pick the right memory (maybe for
>> performance reasons, etc.).
>>
>> So my vision is to have Ion as the the main front-end for DMA-BUF
>> allocations, and expose the central allocator through it (maybe as a
>> default heap type that can be populated on a per-system basis), but also
>> have other individual heap types exported for the edge cases where
>> manual selection is needed like we do now.
>>
>> Hence why Ion should allow direct control of the dma_buf_ops from the
>> heaps, so we can build central allocators as Ion heaps.
>>
>> If I'm off into the weeds here and you have some other ideas I'm all ears.
>>
>
> This is a topic I've gone around a few times. The crux of it is, as
> you know, a central allocator is Really Hard. I don't know what you've
> seen/done so far in this area, so please forgive me if this is old hat
> to you.
>

I'm very new to all this, so any pointers to history in this area are
appreciated.

> Android's platform-specific Gralloc module actually does a pretty good
> job, and because it's platform-specific, it can be simple. That does
> have a certain appeal to it over something generic but complex.
>
> It seems that generic gets insanely complicated really quickly -
> picking out "compatible" is hard enough, but improving that to pick
> out "optimal" is ... well, I've not seen any good proposals for that.
>

I quickly found out the extent of how complicated this can get. I
originally wanted a generic central allocator, but to be complete it
would have to handle every one of everyones uses, so just not going to
happen. So I refocused on simply a central allocator, it didn't need to
be generic and can be made specifically for each given system of ours.
Even that quickly ran into objections when proposed internally, there
will always be edge use-cases that require manual selection of
allocation type (due to performance and resource partitioning). Now I'm
at an optional central allocator plus the standard allocator heaps
provided by Ion, the system specific central allocator being also
exported by Ion for simplicity.

So all I want is to find is "compatible" when needed, leave "optimal"
selection up to the applications that need it.

> In case you didn't come across it already, the effort which seems to
> have gained the most "air-time" recently is
> https://github.com/cubanismo/allocator, which is still a userspace
> module (perhaps some concepts from there could go into the kernel?),
> but makes some attempts at generic constraint solving. It's also not
> really moving anywhere at the moment.
>

Very interesting, I'm going to have to stare at this for a bit.

Thanks,
Andrew

> Cheers,
> -Brian
>
>> Andrew
>>
>>>>> I wasn't aware that CPU access before first device access was
>>>>> considered an abuse of the API - it seems like a valid thing to want
>>>>> to do.
>>>>>
>>>>
>>>> That's just it, I don't know if it is an abuse of API, I'm trying to get
>>>> some clarity on that. If we do want to allow early CPU access then that
>>>> seems to be in contrast to the idea of deferred allocation until first
>>>> device map, what is supposed to backing the buffer if no devices have
>>>> attached or mapped yet? Just some system memory followed by migration on
>>>> the first attach to the proper backing? Seems too time wasteful to be
>>>> have a valid use.
>>>>
>>>> Maybe it should be up to the exporter if early CPU access is allowed?
>>>>
>>>> I'm hoping someone with authority over the DMA-BUF framework can clarify
>>>> original intentions here.
>>>
>>> I don't think dma-buf as a framework stops early CPU access, and the
>>> exporter can definitely decide on that by implementing
>>> begin_cpu_access / end_cpu_access operations to not allow early CPU
>>> access, if it so desires.
>>>