Re: [EXT] Re: [RFC PATCH 0/2] mm: mempolicy: Multi-tier interleaving

From: Srinivasulu Thanneeru
Date: Tue Oct 03 2023 - 01:07:20 EST


Micron Confidential

Hi Huang,

Thanks to you for your comments and in the next version, these suggestions will be incorporated.

Regards,
Srini

Micron Confidential
+AF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8-
From: Huang, Ying +ADw-ying.huang+AEA-intel.com+AD4-
Sent: Thursday, September 28, 2023 11:44 AM
To: Ravis OpenSrc
Cc: linux-mm+AEA-vger.kernel.org+ADs- linux-cxl+AEA-vger.kernel.org+ADs- linux-kernel+AEA-vger.kernel.org+ADs- linux-arch+AEA-vger.kernel.org+ADs- linux-api+AEA-vger.kernel.org+ADs- luto+AEA-kernel.org+ADs- tglx+AEA-linutronix.de+ADs- mingo+AEA-redhat.com+ADs- bp+AEA-alien8.de+ADs- dietmar.eggemann+AEA-arm.com+ADs- vincent.guittot+AEA-linaro.org+ADs- dave.hansen+AEA-linux.intel.com+ADs- hpa+AEA-zytor.com+ADs- arnd+AEA-arndb.de+ADs- akpm+AEA-linux-foundation.org+ADs- x86+AEA-kernel.org+ADs- aneesh.kumar+AEA-linux.ibm.com+ADs- gregory.price+AEA-memverge.com+ADs- John Groves+ADs- Srinivasulu Thanneeru+ADs- Eishan Mirakhur+ADs- Vishal Tanna
Subject: +AFs-EXT+AF0- Re: +AFs-RFC PATCH 0/2+AF0- mm: mempolicy: Multi-tier interleaving

CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.


Hi, Ravi,

Thanks for the patch+ACE-

Ravi Jonnalagadda +ADw-ravis.opensrc+AEA-micron.com+AD4- writes:

+AD4- From: Ravi Shankar +ADw-ravis.opensrc+AEA-micron.com+AD4-
+AD4-
+AD4- Hello,
+AD4-
+AD4- The current interleave policy operates by interleaving page requests
+AD4- among nodes defined in the memory policy. To accommodate the
+AD4- introduction of memory tiers for various memory types (e.g., DDR, CXL,
+AD4- HBM, PMEM, etc.), a mechanism is needed for interleaving page requests
+AD4- across these memory types or tiers.

Why do we need interleaving page allocation among memory tiers? I think
that you need to make it more explicit. I guess that it's to increase
maximal memory bandwidth for workloads?

Yes, it is to increase the maximal memory bandwidth.

+AD4- This can be achieved by implementing an interleaving method that
+AD4- considers the tier weights.
+AD4- The tier weight will determine the proportion of nodes to select from
+AD4- those specified in the memory policy.
+AD4- A tier weight can be assigned to each memory type within the system.

What is the problem of the original interleaving? I think you need to
make it explicit too.

The original approach, page distribution is fixed 1:1, user/admin cannot be changed as required. The need to use different ratios has become evident from the introduction of new memory tiers that cover a wide range of memory types.

With default interleaving we observed memory bandwidth utilization is less compare to the proposed approach with 85:15, when interleave between DRR and CXL.

We will capture this information in next series.

+AD4- Hasan Al Maruf had put forth a proposal for interleaving between two
+AD4- tiers, namely the top tier and the low tier. However, this patch was
+AD4- not adopted due to constraints on the number of available tiers.
+AD4-
+AD4- https://lore.kernel.org/linux-mm/YqD0+ACU-2FtzFwXvJ1gK6+AEA-cmpxchg.org/T/
+AD4-
+AD4- New proposed changes:
+AD4-
+AD4- 1. Introducea sysfs entry to allow setting the interleave weight for each
+AD4- memory tier.
+AD4- 2. Each tier with a default weight of 1, indicating a standard 1:1
+AD4- proportion.
+AD4- 3. Distribute the weight of that tier in a uniform manner across all nodes.
+AD4- 4. Modifications to the existing interleaving algorithm to support the
+AD4- implementation of multi-tier interleaving based on tier-weights.
+AD4-
+AD4- This is inline with Huang, Ying's presentation in lpc22, 16th slide in
+AD4- https://lpc.events/event/16/contributions/1209/attachments/1042/1995//
+AD4- Live+ACU-20In+ACU-20a+ACU-20World+ACU-20With+ACU-20Multiple+ACU-20Memory+ACU-20Types.pdf

Thanks to refer to the original work about this.

+AD4- Observed a significant increase (165+ACU-) in bandwidth utilization
+AD4- with the newly proposed multi-tier interleaving compared to the
+AD4- traditional 1:1 interleaving approach between DDR and CXL tier nodes,
+AD4- where 85+ACU- of the bandwidth is allocated to DDR tier and 15+ACU- to CXL
+AD4- tier with MLC -w2 option.

It appears that +ACI-mlc+ACI- isn't an open source software. Better to use a
open source software to test. And, even better to use a more practical
workloads instead of a memory bandwidth/latency measurement tool.

Sure, will try it.

+AD4- Usage Example:
+AD4-
+AD4- 1. Set weights for DDR (tier4) and CXL(teir22) tiers.
+AD4- echo 85 +AD4- /sys/devices/virtual/memory+AF8-tiering/memory+AF8-tier4/interleave+AF8-weight
+AD4- echo 15 +AD4- /sys/devices/virtual/memory+AF8-tiering/memory+AF8-tier22/interleave+AF8-weight
+AD4-
+AD4- 2. Interleave between DRR(tier4, node-0) and CXL (tier22, node-1) using numactl
+AD4- numactl -i0,1 mlc --loaded+AF8-latency W2
+AD4-
+AD4- Srinivasulu Thanneeru (2):
+AD4- memory tier: Introduce sysfs for tier interleave weights.
+AD4- mm: mempolicy: Interleave policy for tiered memory nodes
+AD4-
+AD4- include/linux/memory-tiers.h +AHw- 27 +-+-+-+-+-+-+-+--
+AD4- include/linux/sched.h +AHw- 2 +-
+AD4- mm/memory-tiers.c +AHw- 67 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+--------
+AD4- mm/mempolicy.c +AHw- 107 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---
+AD4- 4 files changed, 174 insertions(+-), 29 deletions(-)

--
Best Regards,
Huang, Ying