Re: Enabling peer to peer device transactions for PCIe devices

From: Serguei Sagalovitch
Date: Fri Nov 25 2016 - 13:54:23 EST

On 2016-11-25 08:22 AM, Christian König wrote:

Serguei, what is your plan in GPU land for migration? Ie if I have a
CPU mapped page and the GPU moves it to VRAM, it becomes non-cachable
- do you still allow the CPU to access it? Or do you swap it back to
cachable memory if the CPU touches it?

Depends on the policy in command, but currently it's the other way around most of the time.

E.g. we allocate memory in VRAM, the CPU writes to it WC and avoids reading because that is slow, the GPU in turn can access it with full speed.

When we run out of VRAM we move those allocations to system memory and update both the CPU as well as the GPU page tables.

So that move is transparent for both userspace as well as shaders running on the GPU.
I would like to add more in relation to CPU access :

a) we could have CPU-accessible part of VRAM ("inside" of PCIe BAR register)
and non-CPU accessible part. As the result if user needs to have
CPU access than memory should be located in CPU-accessible part
of VRAM or in system memory.

Application/user mode driver could specify preference/hints of
locations based on their assumption / knowledge about access
patterns requirements, game resolution, knowledge
about size of VRAM memory, etc. So if CPU access performance
is critical then such memory should be allocated in system memory
as the first (and may be only) choice.

b) Allocation may not have CPU address at all - only GPU one.
Also we may not be able to have CPU address/accesses for all VRAM
memory but memory may still be migrated in any case unrelated
if we have CPU address or not.

c) " VRAM, it becomes non-cachable "
Strictly speaking VRAM is configured as WC (write-combined memory) to
provide fast CPU write access. Also it was found that sometimes if CPU
access is not critical from performance perspective it may be useful
to allocate/program system memory also as WC to avoid needs for
extra "snooping" to synchronize with CPU caches during GPU access.
So potentially system memory could be WC too.