Re: [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration

From: Michal Hocko
Date: Wed Jul 16 2014 - 04:35:15 EST


[Sorry I have missed this thread]

On Tue 15-07-14 10:45:39, Johannes Weiner wrote:
[...]
> From 274b94ad83b38fe7dc1707a8eb4015b3ab1673c5 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@xxxxxxxxxxx>
> Date: Thu, 10 Jul 2014 01:02:11 +0000
> Subject: [patch] mm: memcontrol: rewrite uncharge API fix - double migration
>
> Hugh reports:
>
> VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
> mm/memcontrol.c:6680!
> page had count 1 mapcount 0 mapping anon index 0x196
> flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
> mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
> compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
> __alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
> handle_mm_fault < __do_page_fault
>
> mem_cgroup_migrate() assumes that a page is only migrated once and
> then freed immediately after.
>
> However, putting the page back on the LRU list and dropping the
> isolation refcount is not done atomically. This allows a PFN-based
> migrator like compaction to isolate the page, see the expected
> anonymous page refcount of 1, and migrate the page once more.
>
> Furthermore, once the charges are transferred to the new page, the old
> page no longer has a pin on the memcg, which might get released before
> the page itself now. pc->mem_cgroup is invalid at this point, but
> PCG_USED suggests otherwise, provoking use-after-free.

The same applies to to the new page because we are transferring only
statistics. The old page with PCG_USED would uncharge the res_counter
and so the new page is not backed by any and so memcg can go away.
This sounds like a more probable scenario to me because old page should
go away quite early after successful migration.

> Properly uncharge the page after it's been migrated, including the
> clearing of PCG_USED, so that a subsequent charge migration attempt
> will be able to detect it and bail out.
>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Reported-by: Hugh Dickins <hughd@xxxxxxxxxx>
> ---
> mm/memcontrol.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1e3b27f8dc2f..1439537fe7c9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6655,7 +6655,6 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
>
> VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage);
> VM_BUG_ON_PAGE(do_swap_account && !(pc->flags & PCG_MEMSW), oldpage);
> - pc->flags &= ~(PCG_MEM | PCG_MEMSW);
>
> if (PageTransHuge(oldpage)) {
> nr_pages <<= compound_order(oldpage);
> @@ -6663,6 +6662,13 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
> VM_BUG_ON_PAGE(!PageTransHuge(newpage), newpage);
> }
>
> + pc->flags = 0;
> +
> + local_irq_disable();
> + mem_cgroup_charge_statistics(pc->mem_cgroup, oldpage, -nr_pages);
> + memcg_check_events(pc->mem_cgroup, oldpage);
> + local_irq_enable();
> +
> commit_charge(newpage, pc->mem_cgroup, nr_pages, lrucare);
> }

Looks good to me. I am just wondering whether we should really
fiddle with stats and events when actually nothing changed during
the transition. I would simply extract core of commit_charge into
__commit_charge which would be called from here.

The impact is minimal because events are rate limited and stats are
per-cpu so it is not a big deal it just looks ugly to me.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/