On 20 Sep 2023, at 13:23, Zi Yan wrote:
On 20 Sep 2023, at 12:04, Johannes Weiner wrote:
On Wed, Sep 20, 2023 at 09:48:12AM -0400, Johannes Weiner wrote:
On Wed, Sep 20, 2023 at 08:07:53AM +0200, Vlastimil Babka wrote:
On 9/20/23 03:38, Zi Yan wrote:
On 19 Sep 2023, at 20:32, Mike Kravetz wrote:
On 09/19/23 16:57, Zi Yan wrote:
On 19 Sep 2023, at 14:47, Mike Kravetz wrote:
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page,
end = pageblock_end_pfn(pfn) - 1;
/* Do not cross zone boundaries */
+#if 0
if (!zone_spans_pfn(zone, start))
start = zone->zone_start_pfn;
+#else
+ if (!zone_spans_pfn(zone, start))
+ start = pfn;
+#endif
if (!zone_spans_pfn(zone, end))
return false;
I can still trigger warnings.
OK. One thing to note is that the page type in the warning changed from
5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change.
Just to be really clear,
- the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path.
- the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call
path WITHOUT your change.
I am guessing the difference here has more to do with the allocation path?
I went back and reran focusing on the specific migrate type.
Without your patch, and coming from the alloc_contig_range call path,
I got two warnings of 'page type is 0, passed migratetype is 1' as above.
With your patch I got one 'page type is 0, passed migratetype is 1'
warning and one 'page type is 1, passed migratetype is 0' warning.
I could be wrong, but I do not think your patch changes things.
Got it. Thanks for the clarification.
One idea about recreating the issue is that it may have to do with size
of my VM (16G) and the requested allocation sizes 4G. However, I tried
to really stress the allocations by increasing the number of hugetlb
pages requested and that did not help. I also noticed that I only seem
to get two warnings and then they stop, even if I continue to run the
script.
Zi asked about my config, so it is attached.
With your config, I still have no luck reproducing the issue. I will keep
trying. Thanks.
Perhaps try running both scripts in parallel?
Yes. It seems to do the trick.
Adjust the number of hugetlb pages allocated to equal 25% of memory?
I am able to reproduce it with the script below:
while true; do
echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages&
echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages&
wait
echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
done
I will look into the issue.
Nice!
I managed to reproduce it ONCE, triggering it not even a second after
starting the script. But I can't seem to do it twice, even after
several reboots and letting it run for minutes.
I managed to reproduce it reliably by cutting the nr_hugepages
parameters respectively in half.
The one that triggers for me is always MIGRATE_ISOLATE. With some
printk-tracing, the scenario seems to be this:
#0 #1
start_isolate_page_range()
isolate_single_pageblock()
set_migratetype_isolate(tail)
lock zone->lock
move_freepages_block(tail) // nop
set_pageblock_migratetype(tail)
unlock zone->lock
del_page_from_freelist(head)
expand(head, head_mt)
WARN(head_mt != tail_mt)
start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES)
for (pfn = start_pfn, pfn < end_pfn)
if (PageBuddy())
split_free_page(head)
IOW, we update a pageblock that isn't MAX_ORDER aligned, then drop the
lock. The move_freepages_block() does nothing because the PageBuddy()
is set on the pageblock to the left. Once we drop the lock, the buddy
gets allocated and the expand() puts things on the wrong list. The
splitting code that handles MAX_ORDER blocks runs *after* the tail
type is set and the lock has been dropped, so it's too late.
Yes, this is the issue I can confirm as well. But it is intentional to enable
allocating a contiguous range at pageblock granularity instead of MAX_ORDER
granularity. With your changes below, it no longer works, because if there
is an unmovable page in
[ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES), pageblock_start_pfn(start_pfn)),
the allocation fails but it would succeed in current implementation.
I think a proper fix would be to make move_freepages_block() split the
MAX_ORDER page and put the split pages in the right migratetype free lists.
I am working on that.
After spending half a day on this, I think it is much harder than I thought
to get alloc_contig_range() working with the freelist migratetype hygiene
patchset. Because alloc_contig_range() relies on racy migratetype changes:
1. pageblocks in the range are first marked as MIGRATE_ISOLATE to prevent
another parallel isolation, but they are not moved to the MIGRATE_ISOLATE
free list yet.
2. later in the process, isolate_freepages_range() is used to actually grab
the free pages.
3. there was no problem when alloc_contig_range() works on MAX_ORDER aligned
ranges, since MIGRATE_ISOLATE cannot be set in the middle of free pages or
in-use pages. But it is not the case when alloc_contig_range() work on
pageblock aligned ranges. Now during isolation phase, free or in-use pages
will need to be split to get their subpages into the right free lists.
4. the hardest case is when a in-use page sits across two pageblocks, currently,
the code just isolate one pageblock, migrate the page, and let split_free_page()
to correct the free list later. But to strictly enforce freelist migratetype
hygiene, extra work is needed at free page path to split the free page into
the right freelists.
I need more time to think about how to get alloc_contig_range() properly.
Help is needed for the bullet point 4.