On Tue, 2023-10-10 at 11:02 +0800, Sui Jingfeng wrote:
On LoongArch, cached mapping and uncached mappings are DMA-coherent and guaranteed by
the hardware. While WC mappings is *NOT* DMA-coherent when 3D GPU is involved. Therefore,
On downstream kernel, We disable write combine(WC) mappings at the drm drivers side.
Why it's only an issue when 3D GPU is involved?
What's the difference between 3D GPUs and other devices? Is it possible that the other
devices (say neural accelerators) start to perform DMA accesses in a
similar way and then suddenly broken?
- For buffers at VRAM(device memory), we replace the WC mappings with uncached mappings.
- For buffers reside in RAM, we replace the WC mappings with cached mappings.
By this way, we were able to minimum the side effects, and meet the usable requirements
for all of the GPU drivers.
AFAIK there has been some clear NAK from DRM maintainers towards this
"approach". So it's not possible to be applied upstream.
For DMA non-coherent buffers, we should try to implement arch-specific dma_map_ops,
invalidate the CPU cache and flush the CPU write buffer before the device do DMA. Instead
of pretend to be DMA coherent for all buffers, a kernel cmdline is not a system level
solution for all of GPU drivers and OS release.
IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper location
of the workaround is in the bridge chip driver. Or am I
misunderstanding something?