Re: [PATCH 08/16] nouveau/hmm: fault one page at a time

From: Ralph Campbell
Date: Mon Jun 22 2020 - 14:45:12 EST

Next message: Markus Elfring: "Re: [PATCH] objtool: Fix memory leak in special_get_alts()"
Previous message: Souptick Joarder: "Re: [RFC PATCH] xen/privcmd: Convert get_user_pages*() to pin_user_pages*()"
In reply to: Jason Gunthorpe: "Re: [PATCH 08/16] nouveau/hmm: fault one page at a time"
Next in thread: Ralph Campbell: "[PATCH 03/16] nouveau: fix mixed normal and device private page migration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 6/22/20 10:22 AM, Jason Gunthorpe wrote:

On Fri, Jun 19, 2020 at 02:56:41PM -0700, Ralph Campbell wrote:

The SVM page fault handler groups faults into a range of contiguous
virtual addresses and requests hmm_range_fault() to populate and
return the page frame number of system memory mapped by the CPU.
In preparation for supporting large pages to be mapped by the GPU,
process faults one page at a time. In addition, use the hmm_range
default_flags to fix a corner case where the input hmm_pfns array
is not reinitialized after hmm_range_fault() returns -EBUSY and must
be called again.

Are you sure? hmm_range_fault is pretty expensive per call..

Jason

Short answer is no, I'm not 100% sure.

The GPU might generate a list of dozens or hundreds of fault entries in the
same 4K page, or sequential 4K pages, or some other stride.
A single 2MB mapping might satisfy all of those after calling hmm_range_fault()
for the first fault entry and then skipping all the other fault entries
that fall into that range. So mostly, I'm proposing this change because it
makes handling the compound page case and -EBUSY case simpler.

As for performance, that is hard to say because nouveau is missing policies
for whether to migrate data to GPU memory on a fault or to map system memory.
Since GPU memory is much higher bandwidth, overall performance
can be much higher if the data is migrated to the GPU's local memory but
currently, migration is only performed explicitly under application request
(via OpenCL clEnqueueSVMMigrateMem() call).
If the GPU is only accessing system memory a few times, then it can be faster
to map system memory and not migrate the data so it depends on the application.
Then there is thrashing to consider if the GPU and CPU are both trying to
access the same pages...

Next message: Markus Elfring: "Re: [PATCH] objtool: Fix memory leak in special_get_alts()"
Previous message: Souptick Joarder: "Re: [RFC PATCH] xen/privcmd: Convert get_user_pages*() to pin_user_pages*()"
In reply to: Jason Gunthorpe: "Re: [PATCH 08/16] nouveau/hmm: fault one page at a time"
Next in thread: Ralph Campbell: "[PATCH 03/16] nouveau: fix mixed normal and device private page migration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]