Re: [External Mail] [RFC PATCH] mm/mempolicy: Weighted interleave auto-tuning
From: Joshua Hahn
Date: Wed Dec 25 2024 - 04:31:04 EST
Hi Gregory and Huang,
Sorry for the silence on my end for the past few days. I decided to take
some time off of the computer, but I should be more reponsive now!
On Wed, 25 Dec 2024 08:25:13 +0800 "Huang, Ying" <ying.huang@xxxxxxxxxxxxxxxxx> wrote:
> Gregory Price <gourry@xxxxxxxxxx> writes:
>
> > On Sun, Dec 22, 2024 at 04:29:30PM +0800, Huang, Ying wrote:
> >> Gregory Price <gourry@xxxxxxxxxx> writes:
> >>
> >> > On Sat, Dec 21, 2024 at 01:57:58PM +0800, Huang, Ying wrote:
[.....8<.....]
> > We decided when implementing weights that 0 was a special value that
> > reverts to the system default:
> >
> > Writing an empty string or `0` will reset the weight to the
> > system default. The system default may be set by the kernel
> > or drivers at boot or during hotplug events.
> >
> > I'm ok pulling the default weights in collectively once the first one is
> > written, but 0 is an invalid value which causes issues.
> >
> > We went through that when we initially implemented the feature w/ task-local
> > weights and why the help function overrides it to 1 if it's ever seen.
> >
> > We'll revert back to our initial implementation w/ default_iw_table and
> > iw_table - where iw_table contains user-defined weights. Writing a 0 to
> > iw_table[N] will allow get_il_weight() to retrieve default_iw_table[N]
> > as the docs imply it should.
>
> So, the suggested behavior becomes the following?
>
> default_values [5,2,-] <- 1 node not set, expected to be hotplugged
> user_values [4,2,1] <- user has only set one value, not populated nodes have value 1
> effective [4,2,1]
>
> hotplug event
> default_values [2,1,1] - reweight has occurred
> user_values [4,2,1]
> effective [4,2,1]
Yes, I think this was the intended effect when we were discussing what
interface made the most sense.
> Even if so, we still have another issue. The effective values may be a
> combination of default_values and user_values and it's hard for users to
> identify which one is from default_values and subject to change. For
> example,
>
> user reset weight of node 0 to default: echo 0 > node0
> default_values [2,1,1]
> user_values [0,2,1]
> effective [2,2,1]
>
> change the default again
> default_values [3,1,1] - reweight again
> user_values [0,2,1]
> effective [3,2,1]
Agreed. Actually, this confusion was partly what motivated our new
re-work of the patch in v2, which got rid of the default and user
layers, and made all internal values transparent to the user as well.
That way, there would be no confusion as to the true source of the
value, and the user could be aware that re-weighting would impact
all values, regardless of whehter they were default values or not.
If we are moving away from allowing users to dynamically change the
weightiness (max_node_weight) parameter however, then I think that there
may be more merit to using the two-level default & user values system to
allow for more flexibility.
> This is still quite confusing. Another possible solution is to copy the
> default value instead,
>
> user reset weight of node 0 to default: echo 0 > node0
> default_values [2,1,1]
> user_values [2,2,1] - copy default value when echo 0
> effective [2,2,1]
>
> change the default again
> default_values [3,1,1] - reweight again
> user_values [2,2,1]
> effective [2,2,1]
This makes a lot sense to me, I think it lets us keep both the
transparency of the new one-layered system and all the benefits that
come with having default values that can adapt to hotplug events. One
thing we should consider is that the user should probably be able to
check what the default value is for a given node before deciding to
copy that value over to the weight table.
Having two files for each node (nodeN, defaultN) seems a bit too
cluttered for the user perspective. Making the nodeN interfaces serve
multiple purposes (i.e. echo -1 into the nodes will output the default
value for that node) also seems a bit too complicated as well, in my
opinion. Maybe having a file 'weight_tables' that contains a table of
default/user/effective weights (as have been used in these conversations)
might be useful for the user? (Or maybe just the defaults)
Then a workflow for the user may be as such:
$ cat /sys/kernel/mm/mempolicy/weighted_interleave/weight_tables
default vales: [4,7,2]
user values: [-,-,-]
effective: [4,7,2]
$ echo 4 > /sys/kernel/mm/mempolicy/weighted_interleave/node2
4
...
> The remaining issue is that we cannot revert to default atomically.
> That is, user_values may becomea combination of old and new
> default_values if users echo 0 to each node one by one when kernel is
> changing default_values. To resolve this, we may add another interface
> to do that, for example, "use_default".
>
> echo 1 > use_default
>
> will use default_values for all nodes. We can check whether we are
> using default via
>
> cat use_default
Like mentioned in the previous comments, I think that the "setting one
value to set all the others" is a good method, especially since the
more I think about it (in my limited experience), I think there is rarely
a scenario where a user wants to use a hybrid of manually-set and
default values and is switching back and forth between default and
manual values.
> Anyway, I think that we need a thorough thought about the user space
> interface. And add good document, at least in change log. It's really
> hard to make user space interface right.
>
> I'm open to better user space interface design.
I agree with this, thank you for your feedback. I think there has been
a lot of great points raised in these conversations, and I will do my
best to take these comments into consideration when writing better
documentation.
> ---
> Best Regards,
> Huang, Ying
Thank you for your input! I hope you have a great day and happy holidays!
Joshua