Re: [PATCH RFC 2/2] mm, hugetlb: do not rely on overcommit limit during migration

From: Michal Hocko
Date: Thu Nov 30 2017 - 02:57:48 EST


On Wed 29-11-17 11:52:53, Mike Kravetz wrote:
> On 11/29/2017 01:22 AM, Michal Hocko wrote:
> > What about this on top. I haven't tested this yet though.
>
> Yes, this would work.
>
> However, I think a simple modification to your previous free_huge_page
> changes would make this unnecessary. I was confused in your previous
> patch because you decremented the per-node surplus page count, but not
> the global count. I think it would have been correct (and made this
> patch unnecessary) if you decremented the global counter there as well.

We cannot really increment the global counter because the over number of
surplus pages during migration doesn't increase.

> Of course, this patch makes the surplus accounting more explicit.
>
> If we move forward with this patch, one issue below.
>
> > ---
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 1b6d7783c717..f5fcd4e355dc 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -119,6 +119,7 @@ long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
> > long freed);
> > bool isolate_huge_page(struct page *page, struct list_head *list);
> > void putback_active_hugepage(struct page *page);
> > +void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
> > void free_huge_page(struct page *page);
> > void hugetlb_fix_reserve_counts(struct inode *inode);
> > extern struct mutex *hugetlb_fault_mutex_table;
> > @@ -232,6 +233,7 @@ static inline bool isolate_huge_page(struct page *page, struct list_head *list)
> > return false;
> > }
> > #define putback_active_hugepage(p) do {} while (0)
> > +#define move_hugetlb_state(old, new, reason) do {} while (0)
> >
> > static inline unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
> > unsigned long address, unsigned long end, pgprot_t newprot)
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 037bf0f89463..30601c1c62f3 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -34,6 +34,7 @@
> > #include <linux/hugetlb_cgroup.h>
> > #include <linux/node.h>
> > #include <linux/userfaultfd_k.h>
> > +#include <linux/page_owner.h>
> > #include "internal.h"
> >
> > int hugetlb_max_hstate __read_mostly;
> > @@ -4830,3 +4831,34 @@ void putback_active_hugepage(struct page *page)
> > spin_unlock(&hugetlb_lock);
> > put_page(page);
> > }
> > +
> > +void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason)
> > +{
> > + struct hstate *h = page_hstate(oldpage);
> > +
> > + hugetlb_cgroup_migrate(oldpage, newpage);
> > + set_page_owner_migrate_reason(newpage, reason);
> > +
> > + /*
> > + * transfer temporary state of the new huge page. This is
> > + * reverse to other transitions because the newpage is going to
> > + * be final while the old one will be freed so it takes over
> > + * the temporary status.
> > + *
> > + * Also note that we have to transfer the per-node surplus state
> > + * here as well otherwise the global surplus count will not match
> > + * the per-node's.
> > + */
> > + if (PageHugeTemporary(newpage)) {
> > + int old_nid = page_to_nid(oldpage);
> > + int new_nid = page_to_nid(newpage);
> > +
> > + SetPageHugeTemporary(oldpage);
> > + ClearPageHugeTemporary(newpage);
> > +
> > + if (h->surplus_huge_pages_node[old_nid]) {
> > + h->surplus_huge_pages_node[old_nid]--;
> > + h->surplus_huge_pages_node[new_nid]++;
> > + }
>
> You need to take hugetlb_lock before adjusting the surplus counts.

You are right. Actually moving the code to hugetlb.c was exactly because
I didn't want to take the lock outside of the hugetlb proper. I just
forgot to add it here. Thanks for spotting.
--
Michal Hocko
SUSE Labs