Re: [PATCH] mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations

From: Roman Gushchin
Date: Thu Apr 02 2020 - 15:43:04 EST


On Thu, Apr 02, 2020 at 02:43:49PM +0900, Joonsoo Kim wrote:
> 2020ë 4ì 2ì (ë) ìì 11:54, Roman Gushchin <guro@xxxxxx>ëì ìì:
> >
> > On Wed, Apr 01, 2020 at 07:13:22PM -0700, Andrew Morton wrote:
> > > On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@xxxxxxxxx> wrote:
> > >
> > > > Hello, Roman.
> > > >
> > > > 2020ë 3ì 12ì (ë) ìì 2:35, Roman Gushchin <guro@xxxxxx>ëì ìì:
> > > > >
> > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > > > > > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > > > > > Posting this one for Roman so I can deal with any upstream feedback and
> > > > > > > create a v2 if needed, while scratching my head over the next piece of
> > > > > > > this puzzle :)
> > > > > > >
> > > > > > > ---8<---
> > > > > > >
> > > > > > > From: Roman Gushchin <guro@xxxxxx>
> > > > > > >
> > > > > > > Currently a cma area is barely used by the page allocator because
> > > > > > > it's used only as a fallback from movable, however kswapd tries
> > > > > > > hard to make sure that the fallback path isn't used.
> > > > > >
> > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > > > > > unresolved bugs. Perhaps the idea could be resurrected now?
> > > > >
> > > > > Hi Vlastimil!
> > > > >
> > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> > > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> > > > > it later this year.
> > > > >
> > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly
> > > > > possible that Joonsoo's solution is preferable long-term.
> > > > >
> > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return
> > > > > to this question once we'll have a new version of ZONE_MOVABLE solution.
> > > >
> > > > Hmm... utilization is not the only matter for CMA user. The more
> > > > important one is
> > > > success guarantee of cma_alloc() and this patch would have a bad impact on it.
> > > >
> > > > A few years ago, I have tested this kind of approach and found that increasing
> > > > utilization increases cma_alloc() failure. Reason is that the page
> > > > allocated with
> > > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.
> > > >
> > > > Until now, cma memory isn't used much so this problem doesn't occur easily.
> > > > However, with this patch, it would happen.
> > >
> > > So I guess we keep Roman's patch on hold pending clarification of this
> > > risk?
> >
> > The problem here is that we can't really find problems if we don't use CMA as intended
> > and just leave it free. Me and Rik are actively looking for page migration problems
> > in our production, and we've found and fixed some of them. Our setup is likely different
> > from embedded guys who are in my understanding most active cma users, so even if we
> > don't see any issues I can't guarantee it for everybody.
> >
> > So given Joonsoo's ack down in the thread (btw, I'm sorry I've missed a good optimization
> > he suggested, will send a patch for that), I'd go with this patch at least until
>
> Looks like you mean Minchan's ack. Anyway, I don't object this one.

Right, I'm really sorry.

>
> In fact, I've tested this patch and your fixes for migration problem
> and found that there is
> still migration problem and failure rate is increased by this patch.

Do you mind sharing any details? What kind of pages are those?

I'm using the following patch to dump failed pages:

@@ -1455,6 +1455,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
private, page, pass > 2, mode,
reason);

+ if (rc && reason == MR_CONTIG_RANGE)
+ dump_page(page, "unmap_and_move");
+
switch(rc) {
case -ENOMEM:
/*


> However, given that
> there is no progress on this area for a long time, I think that
> applying the change aggressively
> is required to break the current situation.

I totally agree!

Btw, I've found that cma_release() grabs the cma->lock mutex,
so it can't be called from the atomic context (I've got a lockdep warning).

Of course, I can change the calling side, but I think it's better to change
the cma code to make cma_release() more accepting. What do you think
about the following patch?

Thank you!

--