Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping

From: John Stultz
Date: Mon Oct 15 2018 - 12:29:09 EST


On Fri, Oct 12, 2018 at 10:51 AM, Laura Abbott <labbott@xxxxxxxxxx> wrote:
> On 10/10/2018 04:33 PM, John Stultz wrote:
>>
>> Since 4.12, much later narrowed down to commit 2a55e7b5e544
>> ("staging: android: ion: Call dma_map_sg for syncing and mapping"),
>> we have seen graphics performance issues on the HiKey960.
>>
>> This was initially confounded by the fact that the out-of-tree
>> DRM driver was using HiSi custom ION heap which broke with the
>> 4.12 ION abi changes, so there was lots of suspicion that the
>> performance problems were due to switching to a somewhat simple
>> cma based DRM driver for HiKey960. Additionally, as no
>> performance regression was seen w/ the original HiKey board
>> (which is SMP, not big.LITTLE as w/ HiKey960), there was some
>> thought that the out-of-tree EAS code wasn't quite optimized.
>>
>> But after chasing a number of other leads, I found that
>> reverting the ION code to 4.11-era got the majority of the
>> graphics performance back (there may yet be further EAS tweaks
>> needed), which lead me to the dma_map_sg change.
>>
>> In talking w/ Laura and Liam, it was suspected that the extra
>> cache operations were causing the trouble. Additionally, I found
>> that part of the reason we didn't see this w/ the original
>> HiKey board is that its (proprietary blob) GL code uses ion_mmap
>> and ion_map_dma_buf is called very rarely, where as with
>> HiKey960, the (also proprietary blob) GL code calls
>> ion_map_dma_buf much more frequently via the kernel driver.
>>
>> Anyway, with the cause of the performance regression isolated,
>> I've tried to find a way to improve the performance of the
>> current code.
>>
>> This approach, which I've mostly copied from the drm_prime
>> implementation is to try to track the direction we're mapping
>> the buffers so we can avoid calling dma_map/unmap_sg on every
>> ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do
>> the work in attach/detach paths.
>>
>> I'm not 100% sure of the correctness here, so close review would
>> be good, but it gets the performance back to being similar to
>> reverting the ION code to the 4.11-era.
>>
>> Feedback would be greatly appreciated!
>>
...
>> @@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct
>> dma_buf_attachment *attachment,
>> struct sg_table *table,
>> enum dma_data_direction direction)
>> {
>> - dma_unmap_sg(attachment->dev, table->sgl, table->nents,
>> direction);
>
>
> This changes the semantics so that the only time a buffer
> gets unmapped is on detach. I don't think we want to restrict
> Ion to that behavior but I also don't know if anyone else
> is relying on that. I thought there might have been some Qualcomm
> stuff that did that (Liam? Todd?)
>
> I suspect most of the cost of the dma_map/dma_unmap is from the
> cache flushing and not the actual mapping operations. If this
> is the case, another option might be to figure out how to
> incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
> to decide when they actually want to sync.

Ok. Thanks so much for the feedback and the suggestion. I'll try to
look into dma_attrs here shortly.

thanks
-john