Re: [PATCH 1/4] mm/hugetlb: Enable PUD level huge page migration
From: Michal Hocko
Date: Wed Oct 03 2018 - 06:59:34 EST
On Wed 03-10-18 15:28:23, Anshuman Khandual wrote:
>
>
> On 10/03/2018 12:28 PM, Michal Hocko wrote:
> > On Wed 03-10-18 07:46:27, Anshuman Khandual wrote:
> >>
> >>
> >> On 10/02/2018 06:09 PM, Michal Hocko wrote:
> >>> On Tue 02-10-18 17:45:28, Anshuman Khandual wrote:
> >>>> Architectures like arm64 have PUD level HugeTLB pages for certain configs
> >>>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
> >>>> enabled for migration. It can be achieved through checking for PUD_SHIFT
> >>>> order based HugeTLB pages during migration.
> >>>
> >>> Well a long term problem with hugepage_migration_supported is that it is
> >>> used in two different context 1) to bail out from the migration early
> >>> because the arch doesn't support migration at all and 2) to use movable
> >>> zone for hugetlb pages allocation. I am especially concerned about the
> >>> later because the mere support for migration is not really good enough.
> >>> Are you really able to find a different giga page during the runtime to
> >>> move an existing giga page out of the movable zone?
> >>
> >> I pre-allocate them before trying to initiate the migration (soft offline
> >> in my experiments). Hence it should come from the pre-allocated HugeTLB
> >> pool instead from the buddy. I might be missing something here but do we
> >> ever allocate HugeTLB on the go when trying to migrate ? IIUC it always
> >> came from the pool (unless its something related to ovecommit/surplus).
> >> Could you please kindly explain regarding how migration target HugeTLB
> >> pages are allocated on the fly from movable zone.
> >
> > Hotplug comes to mind. You usually do not pre-allocate to cover full
> > node going offline. And people would like to do that. Another example is
> > CMA. You would really like to move pages out of the way.
>
> You are right.
>
> Hotplug migration:
>
> __offline_pages
> do_migrate_range
> migrate_pages(...new_node_page...)
>
> new_node_page
> new_page_nodemask
> alloc_huge_page_nodemask
> dequeue_huge_page_nodemask (Getting from pool)
> or
> alloc_migrate_huge_page (Getting from buddy - non-gigantic)
> alloc_fresh_huge_page
> alloc_buddy_huge_page
> __alloc_pages_nodemask ----> goes into buddy
>
> CMA allocation:
>
> cma_alloc
> alloc_contig_range
> __alloc_contig_migrate_range
> migrate_pages(...alloc_migrate_target...)
>
> alloc_migrate_target
> new_page_nodemask -> __alloc_pages_nodemask ---> goes into buddy
>
> But this is not applicable for gigantic pages for which it backs off way
> before going into buddy.
This is an implementation detail - mostly a missing or an incomplete
hugetlb overcommit implementation IIRC. The primary point remains the
same. Being able to migrate in principle and feasible enough to migrate
to be placed in zone movable are two distinct things.
[...]
> >> But even if there are some chances of run time allocation failure from
> >> movable zone (as in point 2) that should not block the very initiation
> >> of migration itself. IIUC thats not the semantics for either THP or
> >> normal pages. Why should it be different here. If the allocation fails
> >> we should report and abort as always. Its the caller of migration taking
> >> the chances. why should we prevent that.
> >
> > Yes I agree, hence the distinction between the arch support for
> > migrateability and the criterion on the movable zone placement.
> movable zone placement sounds very tricky here. How can the platform
> (through the hook huge_movable) before hand say whether destination
> page could be allocated from the ZONE_MOVABLE without looking into the
> state of buddy at migration (any sort attempt to do this is going to
> be expensive) or it merely indicates the desire to live with possible
> consequence (unable to hot unplug/CMA going forward) for a migration
> which might end up in an unmovable area.
I do not follow. The whole point of zone_movable is to provide a
physical memory range which is more or less movable. That means that
pages allocated from this zone can be migrated away should there be a
reason for that.
> >>> So I guess we want to split this into two functions
> >>> arch_hugepage_migration_supported and hugepage_movable. The later would
> >>
> >> So the set difference between arch_hugepage_migration_supported and
> >> hugepage_movable still remains un-migratable ? Then what is the purpose
> >> for arch_hugepage_migration_supported page size set in the first place.
> >> Does it mean we allow the migration at the beginning and the abort later
> >> when the page size does not fall within the subset for hugepage_movable.
> >> Could you please kindly explain this in more detail.
> >
> > The purpose of arch_hugepage_migration_supported is to tell whether it
> > makes any sense to even try to migration. The allocation placement is
>
> Which kind of matches what we have right now and being continued with this
> proposal in the series.
Except you only go half way there. Because you still consider "able to
migrate" and "feasible to migrate" as the same thing.
>
> > completely independent on this choice. The later just says whether it is
> > feasible to place a hugepage to the zone movable. Sure regular 2MB pages
>
> What do you exactly mean by feasible ? Wont it depend on the state of the
> buddy allocator (ZONE_MOVABLE in particular) and it's ability to accommodate
> a given huge page. How can the platform decide on it ?
It is not the platform that decides. That is the whole point of the
distinction. It is us to say what is feasible and what we want to
support. Do we want to support giga pages in zone_movable? Under which
conditions? See my point?
> Or as I mentioned
> before it's platform's willingness to live with unmovable huge pages (of
> certain sizes) as a consequence of migration.
No, the platform has no saying in that. The platform only says that it
supports migrating those pages in principle.
--
Michal Hocko
SUSE Labs