Re: [PATCH] mm: page_alloc: fix cma pageblock was stolen in rmqueue fallback

From: Johannes Weiner
Date: Mon Sep 11 2023 - 17:05:47 EST


On Mon, Sep 11, 2023 at 06:13:59PM +0200, Vlastimil Babka wrote:
> On 9/11/23 17:57, Johannes Weiner wrote:
> > On Tue, Sep 05, 2023 at 10:09:22AM +0100, Mel Gorman wrote:
> >> mm: page_alloc: Free pages to correct buddy list after PCP lock contention
> >>
> >> Commit 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock")
> >> returns pages to the buddy list on PCP lock contention. However, for
> >> migratetypes that are not MIGRATE_PCPTYPES, the migratetype may have
> >> been clobbered already for pages that are not being isolated. In
> >> practice, this means that CMA pages may be returned to the wrong
> >> buddy list. While this might be harmless in some cases as it is
> >> MIGRATE_MOVABLE, the pageblock could be reassigned in rmqueue_fallback
> >> and prevent a future CMA allocation. Lookup the PCP migratetype
> >> against unconditionally if the PCP lock is contended.
> >>
> >> [lecopzer.chen@xxxxxxxxxxxx: CMA-specific fix]
> >> Fixes: 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock")
> >> Reported-by: Joe Liu <joe.liu@xxxxxxxxxxxx>
> >> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> >> ---
> >> mm/page_alloc.c | 8 +++++++-
> >> 1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 452459836b71..4053c377fee8 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -2428,7 +2428,13 @@ void free_unref_page(struct page *page, unsigned int order)
> >> free_unref_page_commit(zone, pcp, page, migratetype, order);
> >> pcp_spin_unlock(pcp);
> >> } else {
> >> - free_one_page(zone, page, pfn, order, migratetype, FPI_NONE);
> >> + /*
> >> + * The page migratetype may have been clobbered for types
> >> + * (type >= MIGRATE_PCPTYPES && !is_migrate_isolate) so
> >> + * must be rechecked.
> >> + */
> >> + free_one_page(zone, page, pfn, order,
> >> + get_pcppage_migratetype(page), FPI_NONE);
> >> }
> >> pcp_trylock_finish(UP_flags);
> >> }
> >>
> >
> > I had sent a (similar) fix for this here:
> >
> > https://lore.kernel.org/lkml/20230821183733.106619-4-hannes@xxxxxxxxxxx/
> >
> > The context wasn't CMA, but HIGHATOMIC pages going to the movable
> > freelist. But the class of bug is the same: the migratetype tweaking
> > really only applies to the pcplist, not the buddy slowpath; I added a
> > local pcpmigratetype to make it more clear, and hopefully prevent bugs
> > of this nature down the line.
>
> Seems to be the cleanest solution to me, indeed.
>
> > I'm just preparing v2 of the above series. Do you want me to break
> > this change out and send it separately?
>
> Works for me, if you combine the it with the information about what commit
> that fixes, the CMA implications reported, and Cc stable.

How about this? Based on v6.6-rc1.

---