Re: [PATCH v3] [mm-unstable] mm: Fix memcg reclaim on memory tiered systems

From: Wei Xu
Date: Fri Dec 09 2022 - 11:42:06 EST


On Fri, Dec 9, 2022 at 12:08 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Thu 08-12-22 16:59:36, Wei Xu wrote:
> [...]
> > > What I really mean is to add demotion nodes to the nodemask along with
> > > the set of nodes you want to reclaim from. To me that sounds like a
> > > more natural interface allowing for all sorts of usecases:
> > > - free up demotion targets (only specify demotion nodes in the mask)
> > > - control where to demote (e.g. select specific demotion target(s))
> > > - do not demote at all (skip demotion nodes from the node mask)
> >
> > For clarification, do you mean to add another argument (e.g.
> > demotion_nodes) in addition to the "nodes" argument?
>
> No, nodes=mask argument should control the domain where the memory
> reclaim should happen. That includes both aging and the reclaim. If the
> mask doesn't contain any lower tier node then no demotion will happen.
> If only a subset of lower tiers are specified then only those could be
> used for the demotion process. Or put it otherwise, the nodemask is not
> only used to filter out zonelists during reclaim it also restricts
> migration targets.
>
> Is this more clear now?

In that case, how can we request demotion only from toptier nodes
(without counting any reclaimed bytes from other nodes), which is our
memory tiering use case?

Besides, when both toptier and demotion nodes are specified, the
demoted pages should only be counted as aging and not be counted
towards the requested bytes of try_to_free_mem_cgroup_pages(), which
is what this patch tries to address.

> --
> Michal Hocko
> SUSE Labs