Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy

From: Ding Hui
Date: Thu May 06 2021 - 21:46:54 EST

Next message: kernel test robot: "[tip:sched/urgent] BUILD SUCCESS 0258bdfaff5bd13c4d2383150b7097aecd6b6d82"
Previous message: Yunsheng Lin: "Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4"
In reply to: HORIGUCHI NAOYA(堀口　直也): "Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2021/5/6 15:30, HORIGUCHI NAOYA(堀口直也) wrote:

On Thu, May 06, 2021 at 12:01:34PM +0800, Ding Hui wrote:

On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口直也) wrote:

On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote:

On 21.04.21 04:04, Ding Hui wrote:

Recently we found there is a lot MemFree left in /proc/meminfo after
do a lot of pages soft offline.

I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages.
After take_page_off_buddy, the page is no longer belong to buddy
allocator, and will not be used any more, but we maybe missed accounting
NR_FREE_PAGES in this situation.

Signed-off-by: Ding Hui <dinghui@xxxxxxxxxxxxxx>
---
mm/page_alloc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cfc72873961d..8d65b62784d8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page)
del_page_from_free_list(page_head, zone, page_order);
break_down_buddy_pages(zone, page_head, page, 0,
page_order, migratetype);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -1);
ret = true;
break;
}

Should this use __mod_zone_freepage_state() instead?

Yes, __mod_zone_freepage_state() looks better to me.

And I think that maybe an additional __mod_zone_freepage_state() in
unpoison_memory() is necessary to cancel the decrement. I thought of the
following, but it doesn't build because get_pfnblock_migratetype() is
available only in mm/page_alloc.c, so you might want to add a small exported
routine in mm/page_alloc.c and let it called from unpoison_memory().

@@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn)
}
if (!get_hwpoison_page(p, flags, 0)) {
- if (TestClearPageHWPoison(p))
+ if (TestClearPageHWPoison(p)) {
+ int migratetype = get_pfnblock_migratetype(p, pfn);
+
num_poisoned_pages_dec();
+ __mod_zone_freepage_state(page_zone(p), 1, migratetype);
+ }
unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
pfn, &unpoison_rs);
return 0;

I think there is another problem:
In normal case, we keep the last refcount of the hwpoison page, so
get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when
call put_page.

I think that take_page_off_buddy() should not be called for this case
(the error page have remaining refcount). So it seems that no need to
update NR_FREE_PAGES ?

Yes, take_page_off_buddy() only used for free pages, but we will call page_ref_inc() after that, on the other hand for in used pages, we increased the refcount by get_any_page(), so in both cases, the hwpoisoned pages have refcount great than zero.

I think there is no need to update NR_FREE_PAGES explicitly in unpoison_memory(), the put_page() will help us to update NR_FREE_PAGES and put the page back to buddy.

At race condition, we maybe leak the page because we does not put it back to
buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages,
NR_FREE_PAGES is adjusted correctly.

CPU0 CPU1

soft_offline_page
soft_offline_free_page
page_handle_poison
take_page_off_buddy
SetPageHWPoison
unpoison_memory
if (!get_hwpoison_page(p))
TestClearPageHWPoison
num_poisoned_pages_dec
__mod_zone_freepage_state
return 0
/* miss put the page back to buddy */
page_ref_inc
num_poisoned_pages_inc

Thanks for checking this, unpoison_memory() is racy. Recently we are suggesting
to introduce mf_mutex by [1]. Although this patch is not merged to mainline yet,
but it could be used to prevent the above race too.

[1] https://lore.kernel.org/linux-mm/20210427062953.2080293-2-nao.horiguchi@xxxxxxxxx/

I'll look forward to it, thanks.

How about do nothing and return -EBUSY (so the caller can retry) if unpoison
a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't
try to unpoison containment-failed pages") does ?

@@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn)
}

if (!get_hwpoison_page(p, flags, 0)) {
- if (TestClearPageHWPoison(p))
- num_poisoned_pages_dec();
- unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
+ unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page
%#lx\n",
pfn, &unpoison_rs);
- return 0;
+ return -EBUSY;

Currently unpoison_memory() does not work as reverse operation of take_page_off_buddy()
(it's simply broken), so implementing it at one time would be better.
I'll take time to fix unpoison_memory().

Thanks for your work.
Actually, I'm not sure about the exactly meaning of "broken", it seems that the basic function of unpoison_memory() is ok if not considered the race conditions.

--
Thanks,
- Ding Hui

Next message: kernel test robot: "[tip:sched/urgent] BUILD SUCCESS 0258bdfaff5bd13c4d2383150b7097aecd6b6d82"
Previous message: Yunsheng Lin: "Re: [PATCH bpf] bpf: check for data_len before upgrading mss when 6 to 4"
In reply to: HORIGUCHI NAOYA(堀口　直也): "Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]