Re: [EXT] Re: [RFC PATCH v2 0/2] Node migration between memory tiers

From: Srinivasulu Thanneeru
Date: Mon Dec 18 2023 - 03:56:20 EST



Micron Confidential



Micron Confidential
+AF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8-
From: Huang, Ying +ADw-ying.huang+AEA-intel.com+AD4-
Sent: Friday, December 15, 2023 10:32 AM
To: Srinivasulu Opensrc
Cc: linux-cxl+AEA-vger.kernel.org+ADs- linux-mm+AEA-kvack.org+ADs- Srinivasulu Thanneeru+ADs- aneesh.kumar+AEA-linux.ibm.com+ADs- dan.j.williams+AEA-intel.com+ADs- gregory.price+ADs- mhocko+AEA-suse.com+ADs- tj+AEA-kernel.org+ADs- john+AEA-jagalactic.com+ADs- Eishan Mirakhur+ADs- Vinicius Tavares Petrucci+ADs- Ravis OpenSrc+ADs- Jonathan.Cameron+AEA-huawei.com+ADs- linux-kernel+AEA-vger.kernel.org
Subject: +AFs-EXT+AF0- Re: +AFs-RFC PATCH v2 0/2+AF0- Node migration between memory tiers

CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.


+ADw-sthanneeru.opensrc+AEA-micron.com+AD4- writes:

+AD4- From: Srinivasulu Thanneeru +ADw-sthanneeru.opensrc+AEA-micron.com+AD4-
+AD4-
+AD4- The memory tiers feature allows nodes with similar memory types
+AD4- or performance characteristics to be grouped together in a
+AD4- memory tier. However, there is currently no provision for
+AD4- moving a node from one tier to another on demand.
+AD4-
+AD4- This patch series aims to support node migration between tiers
+AD4- on demand by sysadmin/root user using the provided sysfs for
+AD4- node migration.
+AD4-
+AD4- To migrate a node to a tier, the corresponding node+IBk-s sysfs
+AD4- memtier+AF8-override is written with target tier id.
+AD4-
+AD4- Example: Move node2 to memory tier2 from its default tier(i.e 4)
+AD4-
+AD4- 1. To check current memtier of node2
+AD4- +ACQ-cat /sys/devices/system/node/node2/memtier+AF8-override
+AD4- memory+AF8-tier4
+AD4-
+AD4- 2. To migrate node2 to memory+AF8-tier2
+AD4- +ACQ-echo 2 +AD4- /sys/devices/system/node/node2/memtier+AF8-override
+AD4- +ACQ-cat /sys/devices/system/node/node2/memtier+AF8-override
+AD4- memory+AF8-tier2
+AD4-
+AD4- Usecases:
+AD4-
+AD4- 1. Useful to move cxl nodes to the right tiers from userspace, when
+AD4- the hardware fails to assign the tiers correctly based on
+AD4- memorytypes.
+AD4-
+AD4- On some platforms we have observed cxl memory being assigned to
+AD4- the same tier as DDR memory. This is arguably a system firmware
+AD4- bug, but it is true that tiers represent +ACo-ranges+ACo- of performance
+AD4- and we believe it's important for the system operator to have
+AD4- the ability to override bad firmware or OS decisions about tier
+AD4- assignment as a fail-safe against potential bad outcomes.
+AD4-
+AD4- 2. Useful if we want interleave weights to be applied on memory tiers
+AD4- instead of nodes.
+AD4- In a previous thread, Huang Ying +ADw-ying.huang+AEA-intel.com+AD4- thought
+AD4- this feature might be useful to overcome limitations of systems
+AD4- where nodes with different bandwidth characteristics are grouped
+AD4- in a single tier.
+AD4- https://lore.kernel.org/lkml/87a5rw1wu8.fsf+AEA-yhuang6-desk2.ccr.corp.intel.com/
+AD4-
+AD4- +AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0-
+AD4- Version Notes:
+AD4-
+AD4- V2 : Changed interface to memtier+AF8-override from adistance+AF8-offset.
+AD4- memtier+AF8-override was recommended by
+AD4- 1. John Groves +ADw-john+AEA-jagalactic.com+AD4-
+AD4- 2. Ravi Shankar +ADw-ravis.opensrc+AEA-micron.com+AD4-
+AD4- 3. Brice Goglin +ADw-Brice.Goglin+AEA-inria.fr+AD4-

It appears that you ignored my comments for V1 as follows ...

https://lore.kernel.org/lkml/87o7f62vur.fsf+AEA-yhuang6-desk2.ccr.corp.intel.com/

Thank you Huang, Ying for pointing to this.

https://lpc.events/event/16/contributions/1209/attachments/1042/1995/Live+ACU-20In+ACU-20a+ACU-20World+ACU-20With+ACU-20Multiple+ACU-20Memory+ACU-20Types.pdf

In the presentation above, the adistance+AF8-offsets are per memtype.
We believe that adistance+AF8-offset per node is more suitable and flexible
since we can change it per node. If we keep adistance+AF8-offset per memtype,
then we cannot change it for a specific node of a given memtype.


https://lore.kernel.org/lkml/87jzpt2ft5.fsf+AEA-yhuang6-desk2.ccr.corp.intel.com/

I guess that you need to move all NUMA nodes with same performance
metrics together? If so, That is why we previously proposed to place
the knob in +ACI-memory+AF8-type+ACI-? (From: Huang, Ying )

Yes, memory+AF8-type would be group the related memories togather as single tier.
We should also have a flexibility to move nodes between tiers, to address the issues described in usecases above.

https://lore.kernel.org/lkml/87a5qp2et0.fsf+AEA-yhuang6-desk2.ccr.corp.intel.com/

This patch provides a way to move a node to the correct tier.
We observed in test setups where DRAM and CXL are put under the same
tier (memory+AF8-tier4).
By using this patch, we can move the CXL node away from the DRAM-linked
tier4 and put it in the desired tier.

Regards,
Srini

--
Best Regards,
Huang, Ying

+AD4- V1 : Introduced adistance+AF8-offset sysfs.
+AD4-
+AD4- +AD0APQA9AD0APQA9AD0APQA9AD0APQA9AD0-
+AD4-
+AD4- Srinivasulu Thanneeru (2):
+AD4- base/node: Add sysfs for memtier+AF8-override
+AD4- memory tier: Support node migration between tiers
+AD4-
+AD4- Documentation/ABI/stable/sysfs-devices-node +AHw- 7 +-+-
+AD4- drivers/base/node.c +AHw- 47 +-+-+-+-+-+-+-+-+-+-+-+-
+AD4- include/linux/memory-tiers.h +AHw- 11 +-+-+-
+AD4- include/linux/node.h +AHw- 11 +-+-+-
+AD4- mm/memory-tiers.c +AHw- 85 +-+-+-+-+-+-+-+-+-+-+-+----------
+AD4- 5 files changed, 125 insertions(+-), 36 deletions(-)