Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages

From: David Hildenbrand
Date: Tue Nov 20 2018 - 09:18:47 EST


On 20.11.18 14:43, Michal Hocko wrote:
> From: Michal Hocko <mhocko@xxxxxxxx>
>
> do_migrate_range has been limiting the number of pages to migrate to 256
> for some reason which is not documented. Even if the limit made some
> sense back then when it was introduced it doesn't really serve a good
> purpose these days. If the range contains huge pages then
> we break out of the loop too early and go through LRU and pcp
> caches draining and scan_movable_pages is quite suboptimal.
>
> The only reason to limit the number of pages I can think of is to reduce
> the potential time to react on the fatal signal. But even then the
> number of pages is a questionable metric because even a single page
> might migration block in a non-killable state (e.g. __unmap_and_move).
>
> Remove the limit and offline the full requested range (this is one
> membblock worth of pages with the current code). Should we ever get a
> report that offlining takes too long to react on fatal signal then we
> should rather fix the core migration to use killable waits and bailout
> on a signal.
>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> ---
> mm/memory_hotplug.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c82193db4be6..6263c8cd4491 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, unsigned long private)
> return new_page_nodemask(page, nid, &nmask);
> }
>
> -#define NR_OFFLINE_AT_ONCE_PAGES (256)
> static int
> do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> {
> unsigned long pfn;
> struct page *page;
> - int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
> int not_managed = 0;
> int ret = 0;
> LIST_HEAD(source);
>
> - for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
> + for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> if (!pfn_valid(pfn))
> continue;
> page = pfn_to_page(pfn);
> @@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> ret = -EBUSY;
> break;
> }
> - if (isolate_huge_page(page, &source))
> - move_pages -= 1 << compound_order(head);
> + isolate_huge_page(page, &source);
> continue;
> } else if (PageTransHuge(page))
> pfn = page_to_pfn(compound_head(page))
> @@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> if (!ret) { /* Success */
> put_page(page);
> list_add_tail(&page->lru, &source);
> - move_pages--;
> if (!__PageMovable(page))
> inc_node_page_state(page, NR_ISOLATED_ANON +
> page_is_file_cache(page));
>

Yes, there is basically no statement why it was done that way. If it is
important, there should be one.

(we could also check for pending signals inside that function if really
required)

Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>

--

Thanks,

David / dhildenb