Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API
From: Johannes Weiner
Date: Tue May 12 2020 - 17:58:35 EST
On Tue, May 12, 2020 at 10:38:54AM -0400, Qian Cai wrote:
> > On May 8, 2020, at 2:30 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> >
> > With the page->mapping requirement gone from memcg, we can charge anon
> > and file-thp pages in one single step, right after they're allocated.
> >
> > This removes two out of three API calls - especially the tricky commit
> > step that needed to happen at just the right time between when the
> > page is "set up" and when it's "published" - somewhat vague and fluid
> > concepts that varied by page type. All we need is a freshly allocated
> > page and a memcg context to charge.
> >
> > v2: prevent double charges on pre-allocated hugepages in khugepaged
> >
> > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> > Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
> > ---
> > include/linux/mm.h | 4 +---
> > kernel/events/uprobes.c | 11 +++--------
> > mm/filemap.c | 2 +-
> > mm/huge_memory.c | 9 +++------
> > mm/khugepaged.c | 35 ++++++++++-------------------------
> > mm/memory.c | 36 ++++++++++--------------------------
> > mm/migrate.c | 5 +----
> > mm/swapfile.c | 6 +-----
> > mm/userfaultfd.c | 5 +----
> > 9 files changed, 31 insertions(+), 82 deletions(-)
> []
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >
> > @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm,
> > out_up_write:
> > up_write(&mm->mmap_sem);
> > out_nolock:
> > + if (*hpage)
> > + mem_cgroup_uncharge(*hpage);
> > trace_mm_collapse_huge_page(mm, isolated, result);
> > return;
> > out:
> > - mem_cgroup_cancel_charge(new_page, memcg);
> > goto out_up_write;
> > }
> []
>
> Some memory pressure will crash this new code. It looks like somewhat racy.
>
> if (!page->mem_cgroup)
>
> where page == NULL in mem_cgroup_uncharge().
Thanks for the report, sorry about the inconvenience.
Hm, the page is exclusive at this point, nobody else should be
touching it. After all, khugepaged might reuse the preallocated page
for another pmd if this one fails to collapse.
Looking at the code, I think it's page itself that's garbage, not
page->mem_cgroup changing. If you have CONFIG_NUMA and the allocation
fails, *hpage could contain an ERR_PTR instead of being NULL.
I think we need the following fixlet:
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f2e0a5e5cfbb..f6161e17da26 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1193,7 +1193,7 @@ static void collapse_huge_page(struct mm_struct *mm,
out_up_write:
up_write(&mm->mmap_sem);
out_nolock:
- if (*hpage)
+ if (!IS_ERR_OR_NULL(*hpage))
mem_cgroup_uncharge(*hpage);
trace_mm_collapse_huge_page(mm, isolated, result);
return;
@@ -1928,7 +1928,7 @@ static void collapse_file(struct mm_struct *mm,
unlock_page(new_page);
out:
VM_BUG_ON(!list_empty(&pagelist));
- if (*hpage)
+ if (!IS_ERR_OR_NULL(*hpage))
mem_cgroup_uncharge(*hpage);
/* TODO: tracepoints */
}