Re: [PATCH 2/2] mm: memcontrol: deprecate swapaccounting=0 mode

From: Johannes Weiner
Date: Mon Mar 22 2021 - 11:54:22 EST


On Fri, Mar 19, 2021 at 05:37:05PM -0700, Hugh Dickins wrote:
> On Fri, 19 Mar 2021, Johannes Weiner wrote:
>
> > The swapaccounting= commandline option already does very little
> > today. To close a trivial containment failure case, the swap ownership
> > tracking part of the swap controller has recently become mandatory
> > (see commit 2d1c498072de ("mm: memcontrol: make swap tracking an
> > integral part of memory control") for details), which makes up the
> > majority of the work during swapout, swapin, and the swap slot map.
> >
> > The only thing left under this flag is the page_counter operations and
> > the visibility of the swap control files in the first place, which are
> > rather meager savings. There also aren't many scenarios, if any, where
> > controlling the memory of a cgroup while allowing it unlimited access
> > to a global swap space is a workable resource isolation stragegy.
> >
> > On the other hand, there have been several bugs and confusion around
> > the many possible swap controller states (cgroup1 vs cgroup2 behavior,
> > memory accounting without swap accounting, memcg runtime disabled).
> >
> > This puts the maintenance overhead of retaining the toggle above its
> > practical benefits. Deprecate it.
> >
> > Suggested-by: Shakeel Butt <shakeelb@xxxxxxxxxx>
> > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
>
> This crashes, and needs a fix: see below (plus some nits).
>
> But it's a very welcome cleanup: just getting rid of all those
> !cgroup_memory_noswap double negatives is a relief in itself.
>
> It does suggest eliminating CONFIG_MEMCG_SWAP altogether (just
> using #ifdef CONFIG_SWAP instead, in those parts of CONFIG_MEMCG code);
> but you're right that's a separate cleanup, and not nearly so worthwhile
> as this one (I notice CONFIG_MEMCG_SWAP in some of the arch defconfigs,
> and don't know whether whoever removes CONFIG_MEMCG_SWAP would be
> obligated to remove those too).

2d1c498072de ("mm: memcontrol: make swap tracking an integral part of
memory control") made it invisible to the user already, I only kept
the symbol for convenience in the Makefile:

obj-$(CONFIG_MEMCG_SWAP) += swap_cgroup.o

But I guess we could replace it with

ifdef CONFIG_SWAP
obj-$(CONFIG_MEMCG) += swap_cgroup.c
endif

and I agree, everywhere else it's currently used would be nicer
without it. I'll send a separate patch.

> > @@ -99,7 +92,11 @@ static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
> > /* Whether legacy memory+swap accounting is active */
> > static bool do_memsw_account(void)
> > {
> > - return !cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_noswap;
> > + /* cgroup2 doesn't do mem+swap accounting */
> > + if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > + return false;
> > +
> > + return true;
>
> Nit: I'm not fond of the "if (boolean()) return true; else return false;"
> codestyle, and would prefer the straightforward
>
> return !cgroup_subsys_on_dfl(memory_cgrp_subsys);
>
> but you've chosen otherwise, so, okay.

I prefer a series of branches if a single expression becomes unwieldy,
and individual conditions deserve their own comments.

But it's indeed pretty silly in this case, I'll make it a single line.

> > @@ -7124,7 +7121,7 @@ long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg)
> > {
> > long nr_swap_pages = get_nr_swap_pages();
> >
> > - if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
>
> That needs to check mem_cgroup_disabled() where it used to check
> cgroup_memory_noswap. The convolutions are confusing (which is
> precisely why this is such a welcome cleanup), but without a
> mem_cgroup_disabled() (or NULL memcg) check there, the
> cgroup_disable=memory case oopses on NULLish pointer dereference
> when mem_cgroup_get_nr_swap_pages() is called from get_scan_count().
>
> (This little function has been repeatedly troublesome that way.)

Sigh, yes, this will hopefully be the last instance of this bug...

Thanks for catching that, I'll fix it up.

> > @@ -7141,7 +7138,7 @@ bool mem_cgroup_swap_full(struct page *page)
> >
> > if (vm_swap_full())
> > return true;
> > - if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
>
> Would it now be better to say "if (do_memsw_account())" there?

Yes, I'll change that.

> Or would it be better to eliminate do_memsw_account() altogether,
> and just use !cgroup_subsys_on_dfl(memory_cgrp_subsys) throughout?

I have found do_memsw_account() useful in the past to find those
related pieces. The details elude me know but I remember searching for
this string often during the recent work in this area.

> (Though I don't find "cgroup_subsys_on_dfl" very informative.)

This routinely bothers me as well, but I have not been able to come up
with a good replacement.

memcg_v1()? memcg_legacy_mode()? memory_cgroup_in_bytes()?

> > @@ -7291,27 +7287,13 @@ static struct cftype memsw_files[] = {
> > { }, /* terminate */
> > };
> >
> > -/*
> > - * If mem_cgroup_swap_init() is implemented as a subsys_initcall()
> > - * instead of a core_initcall(), this could mean cgroup_memory_noswap still
> > - * remains set to false even when memcg is disabled via "cgroup_disable=memory"
> > - * boot parameter. This may result in premature OOPS inside
> > - * mem_cgroup_get_nr_swap_pages() function in corner cases.
> > - */
> > static int __init mem_cgroup_swap_init(void)
> > {
> > - /* No memory control -> no swap control */
> > - if (mem_cgroup_disabled())
> > - cgroup_memory_noswap = true;
> > -
> > - if (cgroup_memory_noswap)
> > - return 0;
> > -
>
> Shakeel suggested "if (mem_cgroup_disabled()) return 0;" here,
> and that was the first thing I tried when I got the crash.

It really isn't obviously safe. I'll put it back in.