Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

From: Zi Yan
Date: Wed Sep 20 2023 - 13:23:32 EST


On 20 Sep 2023, at 12:04, Johannes Weiner wrote:

> On Wed, Sep 20, 2023 at 09:48:12AM -0400, Johannes Weiner wrote:
>> On Wed, Sep 20, 2023 at 08:07:53AM +0200, Vlastimil Babka wrote:
>>> On 9/20/23 03:38, Zi Yan wrote:
>>>> On 19 Sep 2023, at 20:32, Mike Kravetz wrote:
>>>>
>>>>> On 09/19/23 16:57, Zi Yan wrote:
>>>>>> On 19 Sep 2023, at 14:47, Mike Kravetz wrote:
>>>>>>
>>>>>>> --- a/mm/page_alloc.c
>>>>>>> +++ b/mm/page_alloc.c
>>>>>>> @@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page,
>>>>>>> end = pageblock_end_pfn(pfn) - 1;
>>>>>>>
>>>>>>> /* Do not cross zone boundaries */
>>>>>>> +#if 0
>>>>>>> if (!zone_spans_pfn(zone, start))
>>>>>>> start = zone->zone_start_pfn;
>>>>>>> +#else
>>>>>>> + if (!zone_spans_pfn(zone, start))
>>>>>>> + start = pfn;
>>>>>>> +#endif
>>>>>>> if (!zone_spans_pfn(zone, end))
>>>>>>> return false;
>>>>>>> I can still trigger warnings.
>>>>>>
>>>>>> OK. One thing to note is that the page type in the warning changed from
>>>>>> 5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change.
>>>>>>
>>>>>
>>>>> Just to be really clear,
>>>>> - the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path.
>>>>> - the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call
>>>>> path WITHOUT your change.
>>>>>
>>>>> I am guessing the difference here has more to do with the allocation path?
>>>>>
>>>>> I went back and reran focusing on the specific migrate type.
>>>>> Without your patch, and coming from the alloc_contig_range call path,
>>>>> I got two warnings of 'page type is 0, passed migratetype is 1' as above.
>>>>> With your patch I got one 'page type is 0, passed migratetype is 1'
>>>>> warning and one 'page type is 1, passed migratetype is 0' warning.
>>>>>
>>>>> I could be wrong, but I do not think your patch changes things.
>>>>
>>>> Got it. Thanks for the clarification.
>>>>>
>>>>>>>
>>>>>>> One idea about recreating the issue is that it may have to do with size
>>>>>>> of my VM (16G) and the requested allocation sizes 4G. However, I tried
>>>>>>> to really stress the allocations by increasing the number of hugetlb
>>>>>>> pages requested and that did not help. I also noticed that I only seem
>>>>>>> to get two warnings and then they stop, even if I continue to run the
>>>>>>> script.
>>>>>>>
>>>>>>> Zi asked about my config, so it is attached.
>>>>>>
>>>>>> With your config, I still have no luck reproducing the issue. I will keep
>>>>>> trying. Thanks.
>>>>>>
>>>>>
>>>>> Perhaps try running both scripts in parallel?
>>>>
>>>> Yes. It seems to do the trick.
>>>>
>>>>> Adjust the number of hugetlb pages allocated to equal 25% of memory?
>>>>
>>>> I am able to reproduce it with the script below:
>>>>
>>>> while true; do
>>>> echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages&
>>>> echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages&
>>>> wait
>>>> echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>>>> echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>>>> done
>>>>
>>>> I will look into the issue.
>>
>> Nice!
>>
>> I managed to reproduce it ONCE, triggering it not even a second after
>> starting the script. But I can't seem to do it twice, even after
>> several reboots and letting it run for minutes.
>
> I managed to reproduce it reliably by cutting the nr_hugepages
> parameters respectively in half.
>
> The one that triggers for me is always MIGRATE_ISOLATE. With some
> printk-tracing, the scenario seems to be this:
>
> #0 #1
> start_isolate_page_range()
> isolate_single_pageblock()
> set_migratetype_isolate(tail)
> lock zone->lock
> move_freepages_block(tail) // nop
> set_pageblock_migratetype(tail)
> unlock zone->lock
> del_page_from_freelist(head)
> expand(head, head_mt)
> WARN(head_mt != tail_mt)
> start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES)
> for (pfn = start_pfn, pfn < end_pfn)
> if (PageBuddy())
> split_free_page(head)
>
> IOW, we update a pageblock that isn't MAX_ORDER aligned, then drop the
> lock. The move_freepages_block() does nothing because the PageBuddy()
> is set on the pageblock to the left. Once we drop the lock, the buddy
> gets allocated and the expand() puts things on the wrong list. The
> splitting code that handles MAX_ORDER blocks runs *after* the tail
> type is set and the lock has been dropped, so it's too late.

Yes, this is the issue I can confirm as well. But it is intentional to enable
allocating a contiguous range at pageblock granularity instead of MAX_ORDER
granularity. With your changes below, it no longer works, because if there
is an unmovable page in
[ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES), pageblock_start_pfn(start_pfn)),
the allocation fails but it would succeed in current implementation.

I think a proper fix would be to make move_freepages_block() split the
MAX_ORDER page and put the split pages in the right migratetype free lists.

I am working on that.

>
> I think this would work fine if we always set MIGRATE_ISOLATE in a
> linear fashion, with start and end aligned to MAX_ORDER. Then we also
> wouldn't have to split things.
>
> There are two reasons this doesn't happen today:
>
> 1. The isolation range is rounded to pageblocks, not MAX_ORDER. In
> this test case they always seem aligned, but it's not
> guaranteed. However,
>
> 2. start_isolate_page_range() explicitly breaks ordering by doing the
> last block in the range before the center. It's that last block
> that triggers the race with __rmqueue_smallest -> expand() for me.
>
> With the below patch I can no longer reproduce the issue:
>
> ---
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index b5c7a9d21257..b7c8730bf0e2 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -538,8 +538,8 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> unsigned long pfn;
> struct page *page;
> /* isolation is done at page block granularity */
> - unsigned long isolate_start = pageblock_start_pfn(start_pfn);
> - unsigned long isolate_end = pageblock_align(end_pfn);
> + unsigned long isolate_start = ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES);
> + unsigned long isolate_end = ALIGN(end_pfn, MAX_ORDER_NR_PAGES);
> int ret;
> bool skip_isolation = false;
>
> @@ -549,17 +549,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> if (ret)
> return ret;
>
> - if (isolate_start == isolate_end - pageblock_nr_pages)
> - skip_isolation = true;
> -
> - /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
> - ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true,
> - skip_isolation, migratetype);
> - if (ret) {
> - unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
> - return ret;
> - }
> -
> /* skip isolated pageblocks at the beginning and end */
> for (pfn = isolate_start + pageblock_nr_pages;
> pfn < isolate_end - pageblock_nr_pages;
> @@ -568,12 +557,21 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> if (page && set_migratetype_isolate(page, migratetype, flags,
> start_pfn, end_pfn)) {
> undo_isolate_page_range(isolate_start, pfn, migratetype);
> - unset_migratetype_isolate(
> - pfn_to_page(isolate_end - pageblock_nr_pages),
> - migratetype);
> return -EBUSY;
> }
> }
> +
> + if (isolate_start == isolate_end - pageblock_nr_pages)
> + skip_isolation = true;
> +
> + /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
> + ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true,
> + skip_isolation, migratetype);
> + if (ret) {
> + undo_isolate_page_range(isolate_start, pfn, migratetype);
> + return ret;
> + }
> +
> return 0;
> }
>
> @@ -591,8 +589,8 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> {
> unsigned long pfn;
> struct page *page;
> - unsigned long isolate_start = pageblock_start_pfn(start_pfn);
> - unsigned long isolate_end = pageblock_align(end_pfn);
> + unsigned long isolate_start = ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES);
> + unsigned long isolate_end = ALIGN(end_pfn, MAX_ORDER_NR_PAGES);
>
> for (pfn = isolate_start;
> pfn < isolate_end;


--
Best Regards,
Yan, Zi

Attachment: signature.asc
Description: OpenPGP digital signature