Re: [PATCH] zswap: improve memory.zswap.writeback inheritance

From: Nhat Pham
Date: Thu Sep 26 2024 - 22:28:33 EST


On Thu, Sep 26, 2024 at 3:55 PM Ivan Shapovalov <intelfx@xxxxxxxxxxxx> wrote:
>
> Improve the inheritance behavior of the `memory.zswap.writeback` cgroup
> attribute introduced during the 6.11 cycle. Specifically, in 6.11 we
> walk the parent cgroups until we find a _disabled_ writeback, which does
> not allow the user to selectively enable zswap writeback while having it
> disabled by default.
>
> Instead, introduce a third value for the `memory.zswap.writeback` cgroup
> attribute meaning "follow the parent", and use it as the default value
> for all cgroups. Additionally, introduce a `zswap.writeback_disable`
> module parameter which is used as the "parent" of the root cgroup,
> thus making it the global default.
>
> Reads from `memory.zswap.writeback` cgroup attribute will be coerced to
> 0 or 1 to maintain backwards compatilibity. However, it is possible to
> write -1 to that attribute to make the cgroup follow the parent again.
>
> Fixes: e39925734909 ("mm/memcontrol: respect zswap.writeback setting from parent cg too")
> Fixes: 501a06fe8e4c ("zswap: memcontrol: implement zswap writeback disabling")
> Signed-off-by: Ivan Shapovalov <intelfx@xxxxxxxxxxxx>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 17 +++++++++++------
> Documentation/admin-guide/mm/zswap.rst | 9 ++++++++-
> include/linux/memcontrol.h | 3 ++-
> include/linux/zswap.h | 6 ++++++
> mm/memcontrol.c | 24 +++++++++++++++++-------
> mm/zswap.c | 9 +++++++++
> 6 files changed, 53 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 95c18bc17083..eea580490679 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1717,10 +1717,12 @@ The following nested keys are defined.
> entries fault back in or are written out to disk.
>
> memory.zswap.writeback
> - A read-write single value file. The default value is "1".
> - Note that this setting is hierarchical, i.e. the writeback would be
> - implicitly disabled for child cgroups if the upper hierarchy
> - does so.
> + A read-write single value file. The default is to follow the parent
> + cgroup configuration, and the root cgroup follows the global
> + ``zswap.writeback_enabled`` module parameter (which is 1 by default).
> + Thus, this setting is hierarchical, i.e. the writeback setting for
> + a child cgroup can be implicitly controlled at runtime by changing
> + any parent value or the global module parameter.
>
> When this is set to 0, all swapping attempts to swapping devices
> are disabled. This included both zswap writebacks, and swapping due
> @@ -1729,8 +1731,11 @@ The following nested keys are defined.
> reclaim inefficiency after disabling writeback (because the same
> pages might be rejected again and again).
>
> - Note that this is subtly different from setting memory.swap.max to
> - 0, as it still allows for pages to be written to the zswap pool.
> + Note that this is different from setting memory.swap.max to 0,
> + as it still allows for pages to be written to the zswap pool.
> +
> + This can also be set to -1, which would make the cgroup (and its
> + future children) follow the parent/global value again.
>

API-design-wise, this seems a bit confusing... Using the value -1 to
indicate the cgroup should follow ancestor is not quite semantically
meaningful. What does "follow the parent" here mean? Reading your
code, it seems to be "find the first non negative from this memcg to
root and just use it", but it's neither intuitive, nor deducable from
the documentation.

There are also ill-defined/poorly-defined behavior. What if we set -1
to all cgroups? Should writeback be enabled or disabled?

The implementation also contradicts the "writeback setting for a child
cgroup can be implicitly controlled at runtime by chaging any parent
value or the global module parameter" part. What happens if we have
the following sequence of zswap.writeback values, from root to current
memcg:

root.memcg.zswap.writeback = 0, 1, -1, -1, -1 = cur_memcg.zswap.writeback

Should we disable or enable cur_memcg's zswap.writeback? It's disabled
at root, but the first non-negative ancestral value is 1, so this
cgroup will follow it (even though that ancestor itself has zswap
writeback disabled!!!)

I think we should separate the values itself from the decision to
check ancestors. The former should be indicated per-memcg, and the
latter decision should only come into play when the memcg itself does
prevent zswap.writeback. It might be less confusing to make the latter
decision globally - perhaps through a mount option, analogous to
memory_recursiveprot?