Re: [PATCH 10/10] mm/hugetlb: not necessary to abuse temporary page to workaround the nasty free_huge_page
From: Wei Yang
Date: Tue Aug 11 2020 - 19:19:39 EST
On Tue, Aug 11, 2020 at 02:43:28PM -0700, Mike Kravetz wrote:
>On 8/10/20 11:54 PM, Michal Hocko wrote:
>>
>> I have managed to forgot all the juicy details since I have made that
>> change. All that remains is that the surplus pages accounting was quite
>> tricky and back then I didn't figure out a simpler method that would
>> achieve the consistent look at those counters. As mentioned above I
>> suspect this could lead to pre-mature allocation failures while the
>> migration is ongoing.
>
>It is likely lost in the e-mail thread, but the suggested change was to
>alloc_surplus_huge_page(). The code which allocates the migration target
>(alloc_migrate_huge_page) will not be changed. So, this should not be
>an issue.
>
>> Sure quite unlikely to happen and the race window
>> is likely very small. Maybe this is even acceptable but I would strongly
>> recommend to have all this thinking documented in the changelog.
>
>I wrote down a description of what happens in the two different approaches
>"temporary page" vs "surplus page". It is at the very end of this e-mail.
>When looking at the details, I came up with what may be an even better
>approach. Why not just call the low level routine to free the page instead
>of going through put_page/free_huge_page? At the very least, it saves a
>lock roundtrip and there is no need to worry about the counters/accounting.
>
>Here is a patch to do that. However, we are optimizing a return path in
>a race condition that we are unlikely to ever hit. I 'tested' it by allocating
>an 'extra' page and freeing it via this method in alloc_surplus_huge_page.
>
>>From 864c5f8ef4900c95ca3f6f2363a85f3cb25e793e Mon Sep 17 00:00:00 2001
>From: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>Date: Tue, 11 Aug 2020 12:45:41 -0700
>Subject: [PATCH] hugetlb: optimize race error return in
> alloc_surplus_huge_page
>
>The routine alloc_surplus_huge_page() could race with with a pool
>size change. If this happens, the allocated page may not be needed.
>To free the page, the current code will 'Abuse temporary page to
>workaround the nasty free_huge_page codeflow'. Instead, directly
>call the low level routine that free_huge_page uses. This works
>out well because the page is new, we hold the only reference and
>already hold the hugetlb_lock.
>
>Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>---
> mm/hugetlb.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
>diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>index 590111ea6975..ac89b91fba86 100644
>--- a/mm/hugetlb.c
>+++ b/mm/hugetlb.c
>@@ -1923,14 +1923,17 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
> /*
> * We could have raced with the pool size change.
> * Double check that and simply deallocate the new page
>- * if we would end up overcommiting the surpluses. Abuse
>- * temporary page to workaround the nasty free_huge_page
>- * codeflow
>+ * if we would end up overcommiting the surpluses.
> */
> if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
>- SetPageHugeTemporary(page);
>+ /*
>+ * Since this page is new, we hold the only reference, and
>+ * we already hold the hugetlb_lock call the low level free
>+ * page routine. This saves at least a lock roundtrip.
The change looks good to me, while I may not understand the "lock roundtrip".
You mean we don't need to release the hugetlb_lock?
>+ */
>+ (void)put_page_testzero(page); /* don't call destructor */
>+ update_and_free_page(h, page);
> spin_unlock(&hugetlb_lock);
>- put_page(page);
> return NULL;
> } else {
> h->surplus_huge_pages++;
>--
>2.25.4
>
>
>Here is a description of the difference in "Temporary Page" vs "Surplus
>Page" approach.
>
>Both only allocate a fresh huge page if surplus_huge_pages is less than
>nr_overcommit_huge_pages. Of course, the lock protecting those counts
>must be dropped to perform the allocation. After reacquiring the lock
>is where we have the proposed difference in behavior.
>
>temporary page behavior
>-----------------------
>if surplus_huge_pages >= h->nr_overcommit_huge_pages
> SetPageHugeTemporary(page)
> spin_unlock(&hugetlb_lock);
> put_page(page);
>
>At this time we know surplus_huge_pages is 'at least' nr_overcommit_huge_pages.
>As a result, any new allocation will fail.
>Only user visible result is that number of huge pages will be one greater than
>that specified by user and overcommit values. This is only visible for the
>short time until the page is actully freed as a result of put_page().
>
>free_huge_page()
> number of huge pages will be decremented
>
>suprlus page behavior
>---------------------
>surplus_huge_pages++
>surplus_huge_pages_node[page_to_nid(page)]++
>if surplus_huge_pages > nr_overcommit_huge_pages
> spin_unlock(&hugetlb_lock);
> put_page(page);
>
>At this time we know surplus_huge_pages is greater than
>nr_overcommit_huge_pages. As a result, any new allocation will fail.
>User visible result is an increase in surplus pages as well as number of
>huge pages. In addition, surplus pages will exceed overcommit. This is
>only visible for the short time until the page is actully freed as a
>result of put_page().
>
>free_huge_page()
> number of huge pages will be decremented
> h->surplus_huge_pages--;
> h->surplus_huge_pages_node[nid]--;
--
Wei Yang
Help you, Help me