Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller
From: Nhat Pham
Date: Tue Oct 03 2023 - 19:26:32 EST
On Tue, Oct 3, 2023 at 3:42 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> On 10/03/23 15:09, Nhat Pham wrote:
> > On Tue, Oct 3, 2023 at 11:39 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > > On Tue, Oct 03, 2023 at 11:01:24AM -0700, Nhat Pham wrote:
> > > > On Tue, Oct 3, 2023 at 10:13 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> > > > > On 10/02/23 17:18, Nhat Pham wrote:
> > > > >
> > > > > IIUC, huge page usage is charged in alloc_hugetlb_folio and uncharged in
> > > > > free_huge_folio. During migration, huge pages are allocated via
> > > > > alloc_migrate_hugetlb_folio, not alloc_hugetlb_folio. So, there is no
> > > > > charging for the migration target page and we uncharge the source page.
> > > > > It looks like there will be no charge for the huge page after migration?
> > > > >
> > > >
> > > > Ah I see! This is a bit subtle indeed.
> > > >
> > > > For the hugetlb controller, it looks like they update the cgroup info
> > > > inside move_hugetlb_state(), which calls hugetlb_cgroup_migrate()
> > > > to transfer the hugetlb cgroup info to the destination folio.
> > > >
> > > > Perhaps we can do something analogous here.
> > > >
> > > > > If my analysis above is correct, then we may need to be careful about
> > > > > this accounting. We may not want both source and target pages to be
> > > > > charged at the same time.
> > > >
> > > > We can create a variant of mem_cgroup_migrate that does not double
> > > > charge, but instead just copy the mem_cgroup information to the new
> > > > folio, and then clear that info from the old folio. That way the memory
> > > > usage counters are untouched.
> > > >
> > > > Somebody with more expertise on migration should fact check me
> > > > of course :)
> > >
> > > The only reason mem_cgroup_migrate() double charges right now is
> > > because it's used by replace_page_cache_folio(). In that context, the
> > > isolation of the old page isn't quite as thorough as with migration,
> > > so it cannot transfer and uncharge directly. This goes back a long
> > > time: 0a31bc97c80c3fa87b32c091d9a930ac19cd0c40
> > >
> > > If you rename the current implementation to mem_cgroup_replace_page()
> > > for that one caller, you can add a mem_cgroup_migrate() variant which
> > > is charge neutral and clears old->memcg_data. This can be used for
> > > regular and hugetlb page migration. Something like this (totally
> > > untested):
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index a4d3282493b6..17ec45bf3653 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -7226,29 +7226,14 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new)
> > > if (mem_cgroup_disabled())
> > > return;
> > >
> > > - /* Page cache replacement: new folio already charged? */
> > > - if (folio_memcg(new))
> > > - return;
> > > -
> > > memcg = folio_memcg(old);
> > > VM_WARN_ON_ONCE_FOLIO(!memcg, old);
> > > if (!memcg)
> > > return;
> > >
> > > - /* Force-charge the new page. The old one will be freed soon */
> > > - if (!mem_cgroup_is_root(memcg)) {
> > > - page_counter_charge(&memcg->memory, nr_pages);
> > > - if (do_memsw_account())
> > > - page_counter_charge(&memcg->memsw, nr_pages);
> > > - }
> > > -
> > > - css_get(&memcg->css);
> > > + /* Transfer the charge and the css ref */
> > > commit_charge(new, memcg);
> > > -
> > > - local_irq_save(flags);
> > > - mem_cgroup_charge_statistics(memcg, nr_pages);
> > > - memcg_check_events(memcg, folio_nid(new));
> > > - local_irq_restore(flags);
> > > + old->memcg_data = 0;
> > > }
> > >
> > > DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
> > >
> >
> > Ah, I like this. Will send a fixlet based on this :)
> > I was scratching my head trying to figure out why we were
> > doing the double charging in the first place. Thanks for the context,
> > Johannes!
>
> Be sure to check for code similar to this in folio_migrate_flags:
>
> void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
> {
> ...
> if (!folio_test_hugetlb(folio))
> mem_cgroup_migrate(folio, newfolio);
> }
>
> There are many places where hugetlb is special cased.
Yeah makes sense. I'm actually gonna take advantage of this,
and remove the test hugetlb check here, so that it will also
migrate the memcg metadata in this case too. See the new patch
I just sent out.
> --
> Mike Kravetz