Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping

From: Liam Mark
Date: Sun Oct 14 2018 - 02:01:24 EST


On Fri, 12 Oct 2018, Laura Abbott wrote:

> On 10/10/2018 04:33 PM, John Stultz wrote:
> > Since 4.12, much later narrowed down to commit 2a55e7b5e544
> > ("staging: android: ion: Call dma_map_sg for syncing and mapping"),
> > we have seen graphics performance issues on the HiKey960.
> >
> > This was initially confounded by the fact that the out-of-tree
> > DRM driver was using HiSi custom ION heap which broke with the
> > 4.12 ION abi changes, so there was lots of suspicion that the
> > performance problems were due to switching to a somewhat simple
> > cma based DRM driver for HiKey960. Additionally, as no
> > performance regression was seen w/ the original HiKey board
> > (which is SMP, not big.LITTLE as w/ HiKey960), there was some
> > thought that the out-of-tree EAS code wasn't quite optimized.
> >
> > But after chasing a number of other leads, I found that
> > reverting the ION code to 4.11-era got the majority of the
> > graphics performance back (there may yet be further EAS tweaks
> > needed), which lead me to the dma_map_sg change.
> >
> > In talking w/ Laura and Liam, it was suspected that the extra
> > cache operations were causing the trouble. Additionally, I found
> > that part of the reason we didn't see this w/ the original
> > HiKey board is that its (proprietary blob) GL code uses ion_mmap
> > and ion_map_dma_buf is called very rarely, where as with
> > HiKey960, the (also proprietary blob) GL code calls
> > ion_map_dma_buf much more frequently via the kernel driver.
> >
> > Anyway, with the cause of the performance regression isolated,
> > I've tried to find a way to improve the performance of the
> > current code.
> >
> > This approach, which I've mostly copied from the drm_prime
> > implementation is to try to track the direction we're mapping
> > the buffers so we can avoid calling dma_map/unmap_sg on every
> > ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do
> > the work in attach/detach paths.
> >
> > I'm not 100% sure of the correctness here, so close review would
> > be good, but it gets the performance back to being similar to
> > reverting the ION code to the 4.11-era.
> >
> > Feedback would be greatly appreciated!
> >
> > Cc: Beata Michalska <Beata.Michalska@xxxxxxx>
> > Cc: Matt Szczesiak <matt.szczesiak@xxxxxxx>
> > Cc: Anders Pedersen <Anders.Pedersen@xxxxxxx>
> > Cc: John Reitan <John.Reitan@xxxxxxx>
> > Cc: Liam Mark <lmark@xxxxxxxxxxxxxx>
> > Cc: Laura Abbott <labbott@xxxxxxxxxx>
> > Cc: Sumit Semwal <sumit.semwal@xxxxxxxxxx>
> > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > Cc: Todd Kjos <tkjos@xxxxxxxxxxx>
> > Cc: Martijn Coenen <maco@xxxxxxxxxxx>
> > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx
> > Signed-off-by: John Stultz <john.stultz@xxxxxxxxxx>
> > ---
> > drivers/staging/android/ion/ion.c | 36
> > +++++++++++++++++++++++++++++++-----
> > 1 file changed, 31 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/staging/android/ion/ion.c
> > b/drivers/staging/android/ion/ion.c
> > index 9907332..a4d7fca 100644
> > --- a/drivers/staging/android/ion/ion.c
> > +++ b/drivers/staging/android/ion/ion.c
> > @@ -199,6 +199,7 @@ struct ion_dma_buf_attachment {
> > struct device *dev;
> > struct sg_table *table;
> > struct list_head list;
> > + enum dma_data_direction dir;
> > };
> > static int ion_dma_buf_attach(struct dma_buf *dmabuf,
> > @@ -220,6 +221,7 @@ static int ion_dma_buf_attach(struct dma_buf *dmabuf,
> > a->table = table;
> > a->dev = attachment->dev;
> > + a->dir = DMA_NONE;
> > INIT_LIST_HEAD(&a->list);
> > attachment->priv = a;
> > @@ -236,6 +238,18 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
> > {
> > struct ion_dma_buf_attachment *a = attachment->priv;
> > struct ion_buffer *buffer = dmabuf->priv;
> > + struct sg_table *table;
> > +
> > + if (!a)
> > + return;
> > +
> > + table = a->table;
> > + if (table) {
> > + if (a->dir != DMA_NONE)
> > + dma_unmap_sg(attachment->dev, table->sgl,
> > table->nents,
> > + a->dir);
> > + sg_free_table(table);
> > + }
> > free_duped_table(a->table);
> > mutex_lock(&buffer->lock);
> > @@ -243,6 +257,7 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
> > mutex_unlock(&buffer->lock);
> > kfree(a);
> > + attachment->priv = NULL;
> > }
> > static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment
> > *attachment,
> > @@ -251,12 +266,24 @@ static struct sg_table *ion_map_dma_buf(struct
> > dma_buf_attachment *attachment,
> > struct ion_dma_buf_attachment *a = attachment->priv;
> > struct sg_table *table;
> > - table = a->table;
> > + if (WARN_ON(direction == DMA_NONE || !a))
> > + return ERR_PTR(-EINVAL);
> > - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> > - direction))
> > - return ERR_PTR(-ENOMEM);
> > + if (a->dir == direction)
> > + return a->table;
> > + if (WARN_ON(a->dir != DMA_NONE))
> > + return ERR_PTR(-EBUSY);
> > +
> > + table = a->table;
> > + if (!IS_ERR(table)) {
> > + if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> > + direction)) {
> > + table = ERR_PTR(-ENOMEM);
> > + } else {
> > + a->dir = direction;
> > + }
> > + }
> > return table;
> > }
> > @@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct
> > dma_buf_attachment *attachment,
> > struct sg_table *table,
> > enum dma_data_direction direction)
> > {
> > - dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
>
> This changes the semantics so that the only time a buffer
> gets unmapped is on detach. I don't think we want to restrict
> Ion to that behavior but I also don't know if anyone else
> is relying on that.

I also have a concern with this patch, wouldn't it run into difficulties
if multiple devices were attached to the same cached ION buffer and one of
those devices was IO-coherent but the other one wasn't.
For example if device A (which is non IO-coherent) wrote to the buffer but
skipped the cache invalidation on dma_unmap and then if the buffer was
dma-mapped into device B (which is IO-coherent) and then device B
attempted to read the buffer it may end up reading stale cache entries.

I don't believe there is any requirement for device A to detach (which
would do the cache invalidation) before having device B dma-map the
buffer.

I believe there would also be issues if device B wrote to the buffer and
then device A tried to read or write it.

> I thought there might have been some Qualcomm
> stuff that did that (Liam? Todd?)

Yes we have a form of "lazy mapping", which clients can opt into using,
which results in iommu page table mapping not being unmaped on dma-unamp.
Instead they are unmapped when the buffer is destroyed.

It is important to note that in our "lazy mapping" implementation cache
maintenance is still applied on dma-map and dma-unmap.
If you need a full description of this feature I can provide it.

>
> I suspect most of the cost of the dma_map/dma_unmap is from the
> cache flushing and not the actual mapping operations. If this
> is the case, another option might be to figure out how to
> incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
> to decide when they actually want to sync.

We have support for this locally on our 4.14 branch.
We have added a dma_map_attrs field to the dma_buf_attachment struct,
clients can then specify dma-attributes here such as
DMA_ATTR_SKIP_CPU_SYNC before dma-mapping the buffer, then we ensure that
these dma attributes are passed to dma_map_sg_attrs when ion_map_dma_buf
is called (same for ion_unmap_dma_buf).

I hope to try and upstream this at some point.

>
> Thanks,
> Laura
>
> > }
> > static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> >
>
>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project