Re: [PATCH v4] Weighted Interleave Auto-tuning
From: Harry (Hyeonggon) Yoo
Date: Tue Feb 04 2025 - 02:50:56 EST
On Tue, Jan 28, 2025 at 02:23:31PM -0800, Joshua Hahn wrote:
> On machines with multiple memory nodes, interleaving page allocations
> across nodes allows for better utilization of each node's bandwidth.
> Previous work by Gregory Price [1] introduced weighted interleave, which
> allowed for pages to be allocated across nodes according to user-set ratios.
>
> Ideally, these weights should be proportional to their bandwidth, so
> that under bandwidth pressure, each node uses its maximal efficient
> bandwidth and prevents latency from increasing exponentially.
>
> At the same time, we want these weights to be as small as possible.
> Having ratios that involve large co-prime numbers like 7639:1345:7 leads
> to awkward and inefficient allocations, since the node with weight 7
> will remain mostly unused (and despite being proportional to bandwidth,
> will not aid in relieving the bandwidth pressure in the other two nodes).
>
> This patch introduces an auto-configuration mode for the interleave
> weights that aims to balance the two goals of setting node weights to be
> proportional to their bandwidths and keeping the weight values low.
> In order to perform the weight re-scaling, we use an internal
> "weightiness" value (fixed to 32) that defines interleave aggression.
>
> In this auto configuration mode, node weights are dynamically updated
> every time there is a hotplug event that introduces new bandwidth.
>
> Users can also enter manual mode by writing "N" or "0" to the new "auto"
> sysfs interface. When a user enters manual mode, the system stops
> dynamically updating any of the node weights, even during hotplug events
> that can shift the optimal weight distribution. The system also enters
> manual mode any time a user sets a node's weight directly by using the
> nodeN interface introduced in [1]. On the other hand, auto mode is
> only entered by explicitly writing "Y" or "1" to the auto interface.
>
> There is one functional change that this patch makes to the existing
> weighted_interleave ABI: previously, writing 0 directly to a nodeN
> interface was said to reset the weight to the system default. Before
> this patch, the default for all weights were 1, which meant that writing
> 0 and 1 were functionally equivalent.
>
> This patch introduces "real" defaults, but moves away from letting users
> use 0 as a "set to default" interface. Rather, users who want to use
> system defaults should use auto mode. This patch seems to be the
> appropriate place to make this change, since we would like to remove
> this usage before users begin to rely on the feature in userspace.
> Moreover, users will not be losing any functionality; they can still
> write 1 into a node if they want a weight of 1. Thus, we deprecate the
> "write zero to reset" feature in favor of returning an error, the same
> way we would return an error when the user writes any other invalid
> weight to the interface.
>
> [1] https://lore.kernel.org/linux-mm/20240202170238.90004-1-gregory.price@xxxxxxxxxxxx/
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@xxxxxxxxx>
> Co-developed-by: Gregory Price <gourry@xxxxxxxxxx>
> Signed-off-by: Gregory Price <gourry@xxxxxxxxxx>
> ---
Hi Joshua,
I'm glad we're close to finalizing the interface.
I believe the author has successfully addressed major concerns
through the revisions. The interface and the code now look good to me.
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
With a few nits:
> diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave
> index 0b7972de04e9..c26879f59d5d 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave
> +++ b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave
> @@ -20,6 +20,34 @@ Description: Weight configuration interface for nodeN
[...snip...]
> +What: /sys/kernel/mm/mempolicy/weighted_interleave/auto
> +Date: January 2025
> +Contact: Linux memory management mailing list <linux-mm@xxxxxxxxx>
> +Description: Auto-weighting configuration interface
> +
> + Configuration mode for weighted interleave. A 'Y' indicates
> + that the system is in auto mode, and a 'N' indicates that
> + the system is in manual mode. All other values are invalid.
> +
> + In auto mode, all node weights are re-calculated and overwritten
> + (visible via the nodeN interfaces) whenever new bandwidth data
> + is made available during either boot or hotplug events.
> +
> + In manual mode, node weights can only be updated by the user.
> + If a node is hotplugged while the user is in manual mode,
> + the node will have a default weight of 1.
> +
> + Modes can be changed by writing Y, N, 1, or 0 to the interface.
> + All other strings will be ignored, and -EINVAL will be returned.
> + If Y or 1 is written to the interface but the recalculation or
> + updates fail at any point (-ENOMEM or -ENODEV), then the mode
> + will remain in manual mode.
nit: the commit log describes that writing 'N' or '0' means
switching to manual mode and writing 1 means switching to auto mode,
but the Documentation does not explicitly states what '0' and '1' does?
> + Writing a new weight to a node directly via the nodeN interface
> + will also automatically update the system to manual mode.
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 80a3481c0470..cc94cba112dd 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -20,6 +20,7 @@
> #include <linux/list_sort.h>
> #include <linux/memregion.h>
> #include <linux/memory.h>
> +#include <linux/mempolicy.h>
nit: is this #include directive necessary?
--
Harry