On 11/25/2016 9:32 PM, Jason Gunthorpe wrote:Really good point.
On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote:I think blocking mmu notifiers against something that is basically
Yes, it is DMA, so this is a valid approach.Like you say below we have to handle short lived in the usual way, andWell a problem which wasn't mentioned so far is that while GPUs do have a
that covers basically every device except IB MRs, including the
command queue on a NVMe drive.
page table to mirror the CPU page table, they usually can't recover from
So what we do is making sure that all memory accessed by the GPU Jobs stays
in place while those jobs run (pretty much the same pinning you do for the
But, you don't need page faults from the GPU to do proper coherent
page table mirroring. Basically when the driver submits the work to
the GPU it 'faults' the pages into the CPU and mirror translation
table (instead of pinning).
Like in ODP, MMU notifiers/HMM are used to monitor for translation
changes. If a change comes in the GPU driver checks if an executing
command is touching those pages and blocks the MMU notifier until the
command flushes, then unfaults the page (blocking future commands) and
unblocks the mmu notifier.
controlled by user-space can be problematic. This can block things like
memory reclaim. If you have user-space access to the device's queues,
user-space can block the mmu notifier forever.