Re: [PATCH v2 1/3] mm: memcontrol: fix swap counter leak on swapout from offline cgroup

From: Michal Hocko
Date: Tue Aug 02 2016 - 16:47:21 EST


On Tue 02-08-16 13:33:37, Johannes Weiner wrote:
> On Tue, Aug 02, 2016 at 06:00:26PM +0200, Michal Hocko wrote:
> > On Tue 02-08-16 18:00:48, Vladimir Davydov wrote:
> > > @@ -5767,15 +5785,20 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
> > > if (!memcg)
> > > return;
> > >
> > > - mem_cgroup_id_get(memcg);
> > > - oldid = swap_cgroup_record(entry, mem_cgroup_id(memcg));
> > > + swap_memcg = mem_cgroup_id_get_active(memcg);
> > > + oldid = swap_cgroup_record(entry, mem_cgroup_id(swap_memcg));
> > > VM_BUG_ON_PAGE(oldid, page);
> > > - mem_cgroup_swap_statistics(memcg, true);
> > > + mem_cgroup_swap_statistics(swap_memcg, true);
> > >
> > > page->mem_cgroup = NULL;
> > >
> > > if (!mem_cgroup_is_root(memcg))
> > > page_counter_uncharge(&memcg->memory, 1);
> > > + if (memcg != swap_memcg) {
> > > + if (!mem_cgroup_is_root(swap_memcg))
> > > + page_counter_charge(&swap_memcg->memsw, 1);
> > > + page_counter_uncharge(&memcg->memsw, 1);
> > > + }
> > >
> > > /*
> > > * Interrupts should be disabled here because the caller holds the
> >
> > The resulting code is a weird mixture of memcg and swap_memcg usage
> > which is really confusing and error prone. Do we really have to do
> > uncharge on an already offline memcg?
>
> The charge is recursive and includes swap_memcg, i.e. live groups, so
> the uncharge is necessary.

Hmm, the charge is recursive, alraight, but then I see only see only
small sympathy for
if (!mem_cgroup_is_root(swap_memcg))
page_counter_charge(&swap_memcg->memsw, 1);
page_counter_uncharge(&memcg->memsw, 1);

we first charge up the hierarchy just to uncharge the same balance from
the lower. So the end result should be same, right? The only reason
would be that we uncharge the lower layer as well. I do not remember
details, but I do not remember we would be checking counters being 0 on
exit.
But it is quite late and my brain is quite burnt so I might miss
something easily. So whatever small style issues, I think the patch
is correct and feel free to add

Acked-by: Michal Hocko <mhocko@xxxxxxxx>

I just think we can make this easier and more straightforward. See the
diff below (not even compile tested - just for an illustration).

> I don't think the code is too bad, though?
> swap_memcg is the target that is being charged for swap, memcg is the
> origin group from which we swap out. Seems pretty straightforward...?
>
> But maybe a comment above the memcg != swap_memcg check would be nice:
>
> /*
> * In case the memcg owning these pages has been offlined and doesn't
> * have an ID allocated to it anymore, charge the closest online
> * ancestor for the swap instead and transfer the memory+swap charge.
> */

comment would be definitely helpful.

> Thinking about it, mem_cgroup_id_get_active() is a little strange; the
> term we use throughout the cgroup code is "online". It might be good
> to rename this mem_cgroup_id_get_online().

yes, that would be better, imho

---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b6ac01d2b908..66868b2a4c8c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5819,6 +5819,14 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(PageLRU(page), page);
VM_BUG_ON_PAGE(page_count(page), page);

+ /*
+ * Interrupts should be disabled here because the caller holds the
+ * mapping->tree_lock lock which is taken with interrupts-off. It is
+ * important here to have the interrupts disabled because it is the
+ * only synchronisation we have for udpating the per-CPU variables.
+ */
+ VM_BUG_ON(!irqs_disabled());
+
if (!do_memsw_account())
return;

@@ -5828,6 +5836,12 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
if (!memcg)
return;

+ /*
+ * In case the memcg owning these pages has been offlined and doesn't
+ * have an ID allocated to it anymore, charge the closest online
+ * ancestor for the swap instead. Hierarchical charges will be preserved
+ * and the offlined one will not cry with some discrepances in statistics
+ */
swap_memcg = mem_cgroup_id_get_active(memcg);
oldid = swap_cgroup_record(entry, mem_cgroup_id(swap_memcg));
VM_BUG_ON_PAGE(oldid, page);
@@ -5837,21 +5851,11 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)

if (!mem_cgroup_is_root(memcg))
page_counter_uncharge(&memcg->memory, 1);
- if (memcg != swap_memcg) {
- if (!mem_cgroup_is_root(swap_memcg))
- page_counter_charge(&swap_memcg->memsw, 1);
- page_counter_uncharge(&memcg->memsw, 1);
- }

- /*
- * Interrupts should be disabled here because the caller holds the
- * mapping->tree_lock lock which is taken with interrupts-off. It is
- * important here to have the interrupts disabled because it is the
- * only synchronisation we have for udpating the per-CPU variables.
- */
- VM_BUG_ON(!irqs_disabled());
- mem_cgroup_charge_statistics(memcg, page, false, -1);
- memcg_check_events(memcg, page);
+ if (memcg == swap_memcg) {
+ mem_cgroup_charge_statistics(memcg, page, false, -1);
+ memcg_check_events(memcg, page);
+ }

if (!mem_cgroup_is_root(memcg))
css_put(&memcg->css);
--
Michal Hocko
SUSE Labs