Re: [PATCH] Documentation/admin-guide/sysctl/vm.rst adding the importance of NUMA-node count to documentation

From: Matthew Cassell
Date: Fri Apr 12 2024 - 16:48:33 EST


Thanks for the feedback. Here is a quick outline I came up with on your
advice:

[...] (original content)

Keep in mind enabling bits in zone_reclaim_mode makes the most sense
for topologies consisting of multiple NUMA nodes. In addition to vanilla
zone_reclaim (clean and unmapped pages), there exist additional bits that
expand which pages are eligible to be reclaimed and dictate scan_control
policy during the reclaim process. The page allocator will attempt to reclaim
memory locally in accordance with these bits before attempting to allocate
on remote nodes.

Allow dirty pages to become candidates for memory reclaim::

echo 2 > /proc/sys/vm/zone_reclaim_mode

[...] (original content)

Allow mapped pages to become candidates for memory reclaim::

echo 4 > /proc/sys/vm/zone_reclaim_mode

[...] (original content)

I'm trying to balance between keeping the original content, being descriptive,
and not going into encyclopedia-mode. My motivation was to stress the importance
of NUMA-node count and describe the additional bits more per your advice.
I added the echo snippets to better segue the aggressive options. Any thoughts
on the above?

On Thu, Apr 11, 2024 at 2:54 AM Vratislav Bendel <vbendel@xxxxxxxxxx> wrote:
>
> On Fri, Apr 5, 2024 at 6:49 PM Matthew Cassell <mcassell411@gmailcom> wrote:
> >
> > If any bits are set in node_reclaim_mode (tunable via
> > /proc/sys/vm/zone_reclaim_mode) within get_pages_from_freelist(), then
> > page allocations start getting early access to reclaim via the
> > node_reclaim() code path when memory pressure increases. This behavior
> > provides the most optimization for multiple NUMA node machines. The above
> > is mentioned in:
> >
> > Commit 9eeff2395e3cfd05c9b2e6 ("[PATCH] Zone reclaim: Reclaim logic")
> > states "Zone reclaim is of particular importance for NUMA machines. It
> > can be more beneficial to reclaim a page than taking the performance
> > penalties that come with allocating a page on a REMOTE zone."
> >
> > While the pros/cons of staying on node versus allocating remotely are
> > mentioned in commit histories and mailing lists. It isn't specifically
> > mentioned in Documentation/ and isn't possible with a lone node. Imagine a
> > situation where CONFIG_NUMA=y (the default on most major distributions)
> > and only a single NUMA node exists. The latter is an oxymoron
> > (single-node == uniform memory access). Informing the user via vm.rst that
> > the most bang for their buck is when multiple nodes exist seems helpful.
> >
>
> I agree that the documentation could be improved to better express the
> implications
> and relevance of setting zone_reclaim_mode bits.
>
> Though I would suggest to go a step further and also elaborate on
> those "additional actions",
> for example something like:
> "The page allocator will attempt to reclaim memory within the zone,
> depending on the bits set,
> before looking for free pages in other zones, namely on remote memory nodes."
>
> > Signed-off-by: Matthew Cassell <mcassell411@xxxxxxxxx>
> > ---
> > Documentation/admin-guide/sysctl/vm.rst | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> > index c59889de122b..10270548af2a 100644
> > --- a/Documentation/admin-guide/sysctl/vm.rst
> > +++ b/Documentation/admin-guide/sysctl/vm.rst
> > @@ -1031,7 +1031,8 @@ Consider enabling one or more zone_reclaim mode bits if it's known that the
> > workload is partitioned such that each partition fits within a NUMA node
> > and that accessing remote memory would cause a measurable performance
> > reduction. The page allocator will take additional actions before
> > -allocating off node pages.
> > +allocating off node pages. Keep in mind enabling bits in zone_reclaim_mode
> > +makes the most sense for topologies consisting of multiple NUMA nodes.
> >
> > Allowing zone reclaim to write out pages stops processes that are
> > writing large amounts of data from dirtying pages on other nodes. Zone
> > --
> > 2.34.1
> >
>