Re: [RFC HMM CDM 0/3] Coherent Device Memory (CDM) on top of HMM
From: Balbir Singh
Date: Sat Apr 08 2017 - 20:32:37 EST
On Fri, 2017-04-07 at 16:28 -0400, JÃrÃme Glisse wrote:
> This patch serie implement coherent device memory using ZONE_DEVICE
> and adds new helper to the HMM framework to support this new kind
> of ZONE_DEVICE memory. This is on top of HMM v19 and you can find a
> branch here:
>Â
> https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-cdm
>Â
> It needs more special casing as it behaves differently from regular
> ZONE_DEVICE (persistent memory). Unlike the unaddressable memory
> type added with HMM patchset, the CDM type can be access by CPU.
> Because of this any page can be migrated to CDM memory, private
> anonymous or share memory (file back or not).
>Â
> It is missing some features like allowing to direct device fault to
> directly allocates device memory (intention is to add new fields to
> vm_fault struct for this).
>Â
>Â
> This is mostly un-tested but i am posting now because i believe we
> want to start discussion on design consideration. So this differ from
> the NUMA approach by adding yet a new type to ZONE_DEVICE with more
> special casing. While it is a rather small patchset, i might have
> miss some code-path that might require more special casing (i and
> other need to audit mm to make sure that everytime mm is confronted
> so such page it behaves as we want).
>Â
> So i believe question is do we want to keep on adding new type to
> ZONE_DEVICE and more special casing each of them or is a NUMA like
> approach better ?
>Â
>Â
> My personal belief is that the hierarchy of memory is getting deeper
> (DDR, HBM stack memory, persistent memory, device memory, ...) and
> it may make sense to try to mirror this complexity within mm concept.
> Generalizing the NUMA abstraction is probably the best starting point
> for this. I know there are strong feelings against changing NUMA so
> i believe now is the time to pick a direction.
Thanks for all your hard-work and effort on this.
I agree that NUMA is the best representation and in the we want
the mm to manage coherent memory. The device memory is very similar
to NUMA, it is cache coherent, can be simultaneously accessed from
both sides. Like you say, this will evolve, my current design proposal
is at
https://github.com/bsingharora/linux/commits/balbir/cdmv1
with HMM patches (v17) on top. The relevant commits are
c0750c30070e8537ca2ee3ddfce3c0bac7eaab26
dcb3ff6d7900ff644d08a3d1892b6c0ab6982021
9041c3fee859b40c1f9d3e60fd48e0f64ee69abb
b26b6e9f3b078a606a0eaada08bc187b96d966a5
I intend to rebase and repost them. The core motivation of this approach
compared to Anshuman's approach https://lwn.net/Articles/704403/ is
avoiding allocator changes, there are however mempolicy changes. Creating
N_COHERENT_MEMORY exclusive to N_MEMORY allows us to avoid changes in
the allocator paths, with the changes being controlled by mempolicy, where
an explicit node allocation works via changes to policy_zonelist() and policy_
nodemask(). This also isolates coherent memory from kswapd and other back-
ground processes, but direct reclaim and direct compaction, etc are expected
to work. The reason for isolation is performance to prevent wrong allocations
ending up on device memory, but there is no strict requirements, one could
easily use migrations to migrate misplaced memory.
>From a HMM perspective, we still find HMM useful for migration, specifically
your migrate_vma() API and the new propose migrate_dma() API that is a
part of this patchset. I think for isolation we prefer the NUMA approach.
We do find HMM useful for hardware that does not have
coherency, but for coherent devices we prefer the NUMA approach.
With HMM we'll start seeing ZONE_DEVICE pages mapped into user space and
that would mean a thorough audit of all code paths to make sure we are
ready for such a use case and enabling those use cases, like you've done
with patch 1. I've done a quick evaluation to check for features like
migration (page cache migration), fault handling to the right location
(direct page cache allocation in the coherent memory), mlock handling,
RSS accounting, memcg enforcement for pages not on LRU, etc.
>Â
> Note that i don't think choosing one would mean we will be stuck with
> it, as long as we don't expose anything new (syscall) to userspace
> and hide thing through driver API then we keep our options open to
> change direction latter on.
>
I agree, but I think user space will need to adopt, for example using
malloc on a coherent device will not work, the user space will need to
have a driver supported way of accessing coherent memory.
Â
> Nonetheless we need to make progress on this as they are hardware
> right around the corner and it would be a shame if we could not
> leverage such hardware with linux.
>Â
>
I agree 100%Â
Balbir Singh.