Re: [PATCH v2] mm: Override mTHP "enabled" defaults at kernel cmdline

From: Barry Song
Date: Mon Aug 12 2024 - 04:36:54 EST


On Mon, Aug 12, 2024 at 8:20 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 12.08.24 07:36, Barry Song wrote:
> > On Sat, Aug 10, 2024 at 1:15 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >>>>>>> -You can change the sysfs boot time defaults of Transparent Hugepage
> >>>>>>> -Support by passing the parameter ``transparent_hugepage=always`` or
> >>>>>>> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> >>>>>>> -to the kernel command line.
> >>>>>>> +You can change the sysfs boot time default for the top-level "enabled"
> >>>>>>> +control by passing the parameter ``transparent_hugepage=always`` or
> >>>>>>> +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> >>>>>>> +kernel command line.
> >>>>>>> +
> >>>>>>> +Alternatively, each supported anonymous THP size can be controlled by
> >>>>>>> +passing ``thp_anon=<size>[KMG]:<state>``, where ``<size>`` is the THP size
> >>>>>>> +and ``<state>`` is one of ``always``, ``madvise``, ``never`` or
> >>>>>>> +``inherit``.
> >>>>>>> +
> >>>>>>> +For example, the following will set 64K THP to ``always``::
> >>>>>>> +
> >>>>>>> + thp_anon=64K:always
> >>>>>>> +
> >>>>>>> +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> >>>>>>> +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> >>>>>>> +not explicitly configured on the command line are implicitly set to
> >>>>>>> +``never``.
> >>>>>>
> >>>>>> I suggest documenting that "thp_anon=" will not effect the value of
> >>>>>> "transparent_hugepage=", or any configured default.
> >>>
> >>> Did you see the previous conversation with Barry about whether or not to honour
> >>> configured defaults when any thp_anon= is provided [1]? Sounds like you also
> >>> think we should honour the PMD "inherit" default if not explicitly provided on
> >>> the command line? (see link for justification for the approach I'm currently
> >>> taking).
> >>
> >> I primarily think that we should document it :)
> >>
> >> What if someone passes "transparent_hugepage=always" and "thp_anon=..."?
> >> I would assume that transparent_hugepage would only affect the global
> >> toggle then?
> >>
> >>>
> >>> [1]
> >>> https://lore.kernel.org/linux-mm/CAGsJ_4x8ruPspuk_FQVggJMWcXLbRuZFq44gg-Dt7Ewt3ExqTw@xxxxxxxxxxxxxx/
> >>>
> >>>>>>
> >>>>>> Wondering if a syntax like
> >>>>>>
> >>>>>> thp_anon=16K,32K,64K:always;1048K,2048K:madvise
> >>>
> >>> Are there examples of that syntax already or have you just made it up? I found
> >>> examples with the colon (:) but nothing this fancy. I guess that's not a reason
> >>> not to do it though (other than the risk of screwing up the parser in a subtle way).
> >>
> >> I made it up -- mostly ;) I think we are quite flexible on what we can
> >> do. As always, maybe we can keep it bit consistent with existing stuff.
> >>
> >> For hugetlb_cma we have things like
> >> "<node>:nn[KMGTPE|[,<node>:nn[KMGTPE]]
> >>
> >> "memmap=" options are more ... advanced, including memory ranges. There
> >> are a bunch more documented in kernel-parameters.txt that have more
> >> elaborate formats.
> >>
> >> Ranges would probably be the most valuable addition. So maybe we should
> >> start with:
> >>
> >> thp_anon=16K-64K:always,1048K-2048K:madvise
> >>
> >> So to enable all THPs it would simply be
> >>
> >> thp_anon=16K-2M:always
> >>
> >> Interesting question what would happen if someone passes:
> >>
> >> thp_anon=8K-2M:always
> >>
> >> Likely we simply would apply it to any size in the range, even if
> >> start/end is not a THP size.
> >>
> >> But we would want to complain to the user if someone only specifies a
> >> single one (or a range does not even select a single one) that does not
> >> exist:
> >>
> >> thp_anon=8K:always
> >
> > How about rejecting settings with any illegal size or policy?
> >
> > I found there are too many corner cases to say "yes" or "no". It seems
> > the code could be much cleaner if we simply reject illegal settings.
> > On the other hand, we should rely on smart users to provide correct
> > settings, shouldn’t we? While working one the code, I felt that
> > extracting partial correct settings from incorrect ones and supporting
> > them might be a bit of over-design.
>
> No strong opinion on failing configs. Ignoring non-existing sizes might
> lead to more portable cmdlines between kernel versions.

Well, I realized that the code I posted has actually applied the correct
parts because it modifies the global variable huge_anon_orders_xxx.
Unless I use temporary variables and then copy the results to
global ones, the code has been this issue to some extent.

maybe I should remove the below "goto err", instead, ignore the
incorrect strings and go to the next strings to parse.

static int __init setup_thp_anon(char *str)
{
...

if (!strcmp(policy, "always")) {
set_bits(&huge_anon_orders_always, start, end);
clear_bits(&huge_anon_orders_inherit,
start, end);
clear_bits(&huge_anon_orders_madvise,
start, end);
} else if (!strcmp(policy, "madvise")) {
...
} else if (!strcmp(policy, "inherit")) {
...
} else if (!strcmp(policy, "never")) {
...
} else {
/* goto err; */ -> pr_err
}
}
}
...
err:
pr_warn("thp_anon=%s: cannot parse(invalid size or policy),
ignored\n", str);
return 0;
}

>
> >
> > I have tested the below code by
> >
> > "thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never"
>
> There are some parameters that separate individual options by ";"
> already (config_acs). Most parameters use ",". No idea what's better
> here, and which separator to use for sizes instead when using "," for
> options. No strong opinion.

But we have used "," for the purpose "128K,512K:inherit". so here
we have to use ";" for "16K-64K:always;128K,512K:inherit"

>
>
> --
> Cheers,
>
> David / dhildenb

Thanks
Barry