Re: [PATCH 02/18] mm: memcontrol: fix theoretical race in charge moving

From: Shakeel Butt
Date: Wed Apr 22 2020 - 12:51:35 EST


On Mon, Apr 20, 2020 at 3:11 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> The move_lock is a per-memcg lock, but the VM accounting code that
> needs to acquire it comes from the page and follows page->mem_cgroup
> under RCU protection. That means that the page becomes unlocked not
> when we drop the move_lock, but when we update page->mem_cgroup. And
> that assignment doesn't imply any memory ordering. If that pointer
> write gets reordered against the reads of the page state -
> page_mapped, PageDirty etc. the state may change while we rely on it
> being stable and we can end up corrupting the counters.
>
> Place an SMP memory barrier to make sure we're done with all page
> state by the time the new page->mem_cgroup becomes visible.
>
> Also replace the open-coded move_lock with a lock_page_memcg() to make
> it more obvious what we're serializing against.
>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> ---
> mm/memcontrol.c | 26 ++++++++++++++------------
> 1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5beea03dd58a..41f5ed79272e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5372,7 +5372,6 @@ static int mem_cgroup_move_account(struct page *page,
> {
> struct lruvec *from_vec, *to_vec;
> struct pglist_data *pgdat;
> - unsigned long flags;
> unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
> int ret;
> bool anon;
> @@ -5399,18 +5398,13 @@ static int mem_cgroup_move_account(struct page *page,
> from_vec = mem_cgroup_lruvec(from, pgdat);
> to_vec = mem_cgroup_lruvec(to, pgdat);
>
> - spin_lock_irqsave(&from->move_lock, flags);
> + lock_page_memcg(page);
>
> if (!anon && page_mapped(page)) {
> __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
> __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
> }
>
> - /*
> - * move_lock grabbed above and caller set from->moving_account, so
> - * mod_memcg_page_state will serialize updates to PageDirty.
> - * So mapping should be stable for dirty pages.
> - */
> if (!anon && PageDirty(page)) {
> struct address_space *mapping = page_mapping(page);
>
> @@ -5426,15 +5420,23 @@ static int mem_cgroup_move_account(struct page *page,
> }
>
> /*
> + * All state has been migrated, let's switch to the new memcg.
> + *
> * It is safe to change page->mem_cgroup here because the page
> - * is referenced, charged, and isolated - we can't race with
> - * uncharging, charging, migration, or LRU putback.
> + * is referenced, charged, isolated, and locked: we can't race
> + * with (un)charging, migration, LRU putback, or anything else
> + * that would rely on a stable page->mem_cgroup.
> + *
> + * Note that lock_page_memcg is a memcg lock, not a page lock,
> + * to save space. As soon as we switch page->mem_cgroup to a
> + * new memcg that isn't locked, the above state can change
> + * concurrently again. Make sure we're truly done with it.
> */
> + smp_mb();

You said theoretical race in the subject but the above comment
convinced me that smp_mb() is required. So, why is the race still
theoretical?

>
> - /* caller should have done css_get */
> - page->mem_cgroup = to;
> + page->mem_cgroup = to; /* caller should have done css_get */
>
> - spin_unlock_irqrestore(&from->move_lock, flags);
> + __unlock_page_memcg(from);
>
> ret = 0;
>
> --
> 2.26.0
>