On Tue, Nov 22, 2016 at 01:21:03PM -0800, Dan Williams wrote:
On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter <daniel@xxxxxxxx> wrote:Yes, agreed. My idea with exposing vram sections using numa nodes wasn't
On Tue, Nov 22, 2016 at 9:35 PM, Serguei SagalovitchI don't think we should be using numa distance to reverse engineer a
<serguei.sagalovitch@xxxxxxx> wrote:
On 2016-11-22 03:10 PM, Daniel Vetter wrote:With migration I meant migrating both ways of course. And with stuff
On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams@xxxxxxxxx>It is possible that there is other way around: memory is requested to be
wrote:
On Tue, Nov 22, 2016 at 10:59 AM, Serguei SagalovitchFor some of the fancy use-cases (e.g. to be comparable to what HMM can
<serguei.sagalovitch@xxxxxxx> wrote:
I personally like "device-DAX" idea but my concerns are:Inside the kernel a device-DAX range is "just memory" in the sense
- How well it will co-exists with the DRM infrastructure /
implementations
in part dealing with CPU pointers?
that you can perform pfn_to_page() on it and issue I/O, but the vma is
not migratable. To be honest I do not know how well that co-exists
with drm infrastructure.
- How well we will be able to handle case when we need toSo, device-DAX deliberately avoids support for in-kernel migration or
"move"/"evict"
memory/data to the new location so CPU pointer should point to the
new
physical location/address
(and may be not in PCI device memory at all)?
overcommit. Those cases are left to the core mm or drm. The device-dax
interface is for cases where all that is needed is a direct-mapping to
a statically-allocated physical-address range be it persistent memory
or some other special reserved memory range.
pull off) I think we want all the magic in core mm, i.e. migration and
overcommit. At least that seems to be the very strong drive in all
general-purpose gpu abstractions and implementations, where memory is
allocated with malloc, and then mapped/moved into vram/gpu address
space through some magic,
allocated and should be kept in vram for performance reason but due
to possible overcommit case we need at least temporally to "move" such
allocation to system memory.
like numactl we can also influence where exactly the malloc'ed memory
is allocated originally, at least if we'd expose the vram range as a
very special numa node that happens to be far away and not hold any
cpu cores.
certain allocation behavior. The latency data should be truthful, but
you're right we'll need a mechanism to keep general purpose
allocations out of that range by default. Btw, strict isolation is
another design point of device-dax, but I think in this case we're
describing something between the two extremes of full isolation and
full compatibility with existing numactl apis.
to reuse all the existing allocation policies directly, those won't work.
So at boot-up your default numa policy would exclude any vram nodes.
But I think (as an -mm layman) that numa gives us a lot of the tools and
policy interface that we need to implement what we want for gpus.
Wrt isolation: There's a sliding scale of what different users expect,
from full auto everything, including migrating pages around if needed to
full isolation all seems to be on the table. As long as we keep vram nodes
out of any default allocation numasets, full isolation should be possible.
-Daniel