[PATCH 5.8 248/255] mm/page_counter: fix various data races at memsw

From: Greg Kroah-Hartman
Date: Tue Sep 01 2020 - 11:50:21 EST


From: Qian Cai <cai@xxxxxx>

commit 6e4bd50f3888fa8fea8bc66a0ad4ad5f1c862961 upstream.

Commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") could had
memcg->memsw->watermark and memcg->memsw->failcnt been accessed
concurrently as reported by KCSAN,

BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge

read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
try_charge+0x131/0xd50 mm/memcontrol.c:2405
__memcg_kmem_charge_memcg+0x58/0x140
__memcg_kmem_charge+0xcc/0x280
__alloc_pages_nodemask+0x1e1/0x450
alloc_pages_current+0xa6/0x120
pte_alloc_one+0x17/0xd0
__pte_alloc+0x3a/0x1f0
copy_p4d_range+0xc36/0x1990
copy_page_range+0x21d/0x360
dup_mmap+0x5f5/0x7a0
dup_mm+0xa2/0x240
copy_process+0x1b3f/0x3460
_do_fork+0xaa/0xa20
__x64_sys_clone+0x13b/0x170
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe

write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
try_charge+0x131/0xd50 mm/memcontrol.c:2405
mem_cgroup_try_charge+0x159/0x460
mem_cgroup_try_charge_delay+0x3d/0xa0
wp_page_copy+0x14d/0x930
do_wp_page+0x107/0x7b0
__handle_mm_fault+0xce6/0xd40
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40

BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge

write to 0xffff88809bbf2158 of 8 bytes by task 11782 on cpu 0:
page_counter_try_charge+0x100/0x170 mm/page_counter.c:129
try_charge+0x185/0xbf0 mm/memcontrol.c:2405
__memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
__memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
__alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780

read to 0xffff88809bbf2158 of 8 bytes by task 11814 on cpu 1:
page_counter_try_charge+0xef/0x170 mm/page_counter.c:129
try_charge+0x185/0xbf0 mm/memcontrol.c:2405
__memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
__memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
__alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780

Since watermark could be compared or set to garbage due to a data race
which would change the code logic, fix it by adding a pair of READ_ONCE()
and WRITE_ONCE() in those places.

The "failcnt" counter is tolerant of some degree of inaccuracy and is only
used to report stats, a data race will not be harmful, thus mark it as an
intentional data race using the data_race() macro.

Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
Reported-by: syzbot+f36cfe60b1006a94f9dc@xxxxxxxxxxxxxxxxxxxxxxxxx
Signed-off-by: Qian Cai <cai@xxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Acked-by: Michal Hocko <mhocko@xxxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: Marco Elver <elver@xxxxxxxxxx>
Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Link: http://lkml.kernel.org/r/1581519682-23594-1-git-send-email-cai@xxxxxx
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

---
mm/page_counter.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -77,8 +77,8 @@ void page_counter_charge(struct page_cou
* This is indeed racy, but we can live with some
* inaccuracy in the watermark.
*/
- if (new > c->watermark)
- c->watermark = new;
+ if (new > READ_ONCE(c->watermark))
+ WRITE_ONCE(c->watermark, new);
}
}

@@ -119,9 +119,10 @@ bool page_counter_try_charge(struct page
propagate_protected_usage(c, new);
/*
* This is racy, but we can live with some
- * inaccuracy in the failcnt.
+ * inaccuracy in the failcnt which is only used
+ * to report stats.
*/
- c->failcnt++;
+ data_race(c->failcnt++);
*fail = c;
goto failed;
}
@@ -130,8 +131,8 @@ bool page_counter_try_charge(struct page
* Just like with failcnt, we can live with some
* inaccuracy in the watermark.
*/
- if (new > c->watermark)
- c->watermark = new;
+ if (new > READ_ONCE(c->watermark))
+ WRITE_ONCE(c->watermark, new);
}
return true;