[RFC 1/8] mm: Define coherent device memory node

From: Anshuman Khandual
Date: Mon Oct 24 2016 - 00:32:29 EST


There are certain devices like specialized accelerator, GPU cards, network
cards, FPGA cards etc which might contain onboard memory which is coherent
along with the existing system RAM while being accessed either from the CPU
or from the device. They share some similar properties with that of normal
system RAM but at the same time can also be different with respect to
system RAM.

User applications might be interested in using this kind of coherent device
memory explicitly or implicitly along side the system RAM utilizing all
possible core memory functions like anon mapping (LRU), file mapping (LRU),
page cache (LRU), driver managed (non LRU), HW poisoning, NUMA migrations
etc. To achieve this kind of tight integration with core memory subsystem,
the device onbaord coherent memory must be represented as a memory only
NUMA node. At the same time pglist_data structure (which is node's memory
representation) of this NUMA node must also be differentiated indicating
that it's coherent device memory not regular system RAM.

After achieving the integration with core memory subsystem through a marked
pglist_data structure, coherent device memory might still need some special
consideration inside the kernel. There can be a variety of coherent memory
nodes with different expectations from the core kernel memory. But right
now only one kind of special treatment is considered which requires certain
isolation.

Now consider the case of a coherent device memory node type which requires
isolation. This kind of coherent memory is onboard an external device
attached to the system through a link where there is always a chance of a
link failure taking down the entire memory node with it. More over the
memory might also have higher chance of ECC failure as compared to the
system RAM. Hence allocation into this kind of coherent memory node should
be regulated. Kernel allocations must not come here. Normal user space
allocations too should not come here implicitly (without user application
knowing about it). This summarizes isolation requirement of certain kind of
coherent device memory node as an example. There can be different kinds of
isolation requirement also.

Some coherent memory devices might not require isolation altogether after
all. Then there might be other coherent memory devices which might require
some other special treatment after being part of core memory representation
For now, will look into isolation seeking coherent device memory node not
the other ones.

This adds a new 'bool coherent' element in pglist_data structure which can
identify any coherent device node. Instead this can be a u64 which can then
hold an array of properties bits for various types of coherent devices in
future.

Signed-off-by: Anshuman Khandual <khandual@xxxxxxxxxxxxxxxxxx>
---
include/linux/mmzone.h | 29 +++++++++++++++++++++++++++++
mm/Kconfig | 13 +++++++++++++
2 files changed, 42 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7f2ae99..821dffb 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -722,8 +722,37 @@ typedef struct pglist_data {
/* Per-node vmstats */
struct per_cpu_nodestat __percpu *per_cpu_nodestats;
atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS];
+
+#ifdef CONFIG_COHERENT_DEVICE
+ /*
+ * Coherent device memory node
+ *
+ * Devices containing coherent memory is represented as a
+ * special coherent memory NUMA node, should be identified
+ * differently compared to normal memory nodes. Though it
+ * shares lot of common properties with system memory, it
+ * also has some differentiating factors as well.
+ *
+ * XXX: Though this is a bool which identifies the isolation
+ * requiring coherent device memory node right now, it can be
+ * extended as a bit mask to represent different properties
+ * for future coherent device memory nodes.
+ */
+ bool coherent_device;
+#endif
} pg_data_t;

+#ifdef CONFIG_COHERENT_DEVICE
+#define node_cdm(nid) (NODE_DATA(nid)->coherent_device)
+#define set_cdm_isolation(nid) (node_cdm(nid) = 1)
+#define clr_cdm_isolation(nid) (node_cdm(nid) = 0)
+#define isolated_cdm_node(nid) (node_cdm(nid) == 1)
+#else
+#define set_cdm_isolation(nid) ()
+#define clr_cdm_isolation(nid) ()
+#define isolated_cdm_node(nid) (0)
+#endif
+
#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
#define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages)
#ifdef CONFIG_FLAT_NODE_MEM_MAP
diff --git a/mm/Kconfig b/mm/Kconfig
index be0ee11..cb50468 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -704,6 +704,19 @@ config ZONE_DEVICE

If FS_DAX is enabled, then say Y.

+config COHERENT_DEVICE
+ bool "Coherent device memory support"
+ depends on MEMORY_HOTPLUG
+ depends on MEMORY_HOTREMOVE
+ depends on PPC64
+ default y
+ help
+ Coherent device memory node support enables the system to hotplug
+ a device with coherent memory as a normal system memory node. FPGA,
+ network, GPU cards etc might contain coherent memory.
+
+ If not sure, then say N.
+
config FRAME_VECTOR
bool

--
2.1.0