[PATCH v5 0/4] drm: nouveau: memory coherency on ARM

From: Alexandre Courbot
Date: Mon Oct 27 2014 - 05:49:38 EST


It has been a couple of months since v4 - apologies for this. v4 has not
received many comments, but this version addresses them and makes a new
attempt at pushing the critical bit for GK20A and Nouveau on ARM in
general.

As a reminder, this series addresses the memory coherency issue that we
are seeing on ARM platforms. Contrary to x86 which invalidates the PCI
caches whenever a write is made by the CPU to a GPU-accessed area (and
vice-versa), such accesses on ARM might result in the other accessor to
end up in an incoherent state.

To address this, patches 1-3 add the ability to understand whether we
are on a non-coherent architecture, implement a way to explicitly allocate
coherent buffers buffers using the DMA API, and uses it for GPFIFOS and
fences. Patch 4 also uses the DMA API to synchronize user-space allocated
buffers when they are passed from the CPU to the GPU and vice-versa.

Thanks to the feedback received on the previous revisions I believe this
code looks rather good now. I also have extensively tested it and could
not see any buffer corruption issue anymore. There is still one point
which is not completely satisfying in my opinion:

TTMs for TTM-backed objects are allocated in nouveau_sgdma_create_ttm()
and populated in nouveau_ttm_tt_populate(). Coherently-allocated buffers
need to use the ttm_dma API instead of the pool-based TTM API, and whether
an object is coherent or not is stored in its instance of nouveau_bo.

The problem is that neither nouveau_sgdma_create_ttm() nor
nouveau_ttm_tt_populate() have a way to access the nouveau_bo they are
working for. This is in particular a problem for nouveau_ttm_tt_populate()
since we need to rely on a purely TTM-based heuristic to decide how to
allocate the memory. The heuristic we are using works, but it makes the
code harder to understand than if we could just access the nouveau_bo.
nouveau_sgdma_create_ttm() always allocates a ttm_dma_tt structure,
which is wrong but happens to suit us for now. Still, this part of the
code could be rewritten much more cleanly if only we could access the
nouveau_bo instance in these functions.

I proposed some time ago to address this by making the ttm_tt_create
hook take a pointer to a ttm_bo_object instead of a ttm_bo_device.
This would still allow us to access the ttm_bo_device, while letting
us retrieve the nouveau_bo and store it into whatever structure we
embed our TTM into. For some reason David was not fond of the idea - I
am taking another chance at submitting it since the issue is still
not resolved and leads in inferior-looking code in at least Nouveau.

Phew, sorry for the long cover letter - thanks if you have read until
here! :)

Changes since v4:
- Only use DMA API for sync, as suggested by Daniel

Alexandre Courbot (4):
drm: introduce nv_device_is_cpu_coherent()
drm: implement explicitly coherent BOs
drm: allocate GPFIFOs and fences coherently
drm: synchronize BOs when required

drm/nouveau_bo.c | 122 ++++++++++++++++++++++++++++++++++++++++++---
drm/nouveau_bo.h | 3 ++
drm/nouveau_chan.c | 2 +-
drm/nouveau_gem.c | 12 +++++
drm/nv84_fence.c | 4 +-
lib/core/os.h | 2 +
nvkm/include/core/device.h | 6 +++
7 files changed, 140 insertions(+), 11 deletions(-)

--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/