Re: [PATCH RFC] mm: compaction: avoid migrating non-cma pages to a cma area

From: Roman Gushchin
Date: Fri Apr 17 2020 - 15:21:44 EST


On Fri, Apr 17, 2020 at 10:37:14AM +0200, Vlastimil Babka wrote:
> On 4/14/20 5:42 PM, Roman Gushchin wrote:
> > On Tue, Apr 14, 2020 at 01:49:45PM +0200, Vlastimil Babka wrote:
> >
> > Hello, Vlastimil!
> >
> > Thank you for looking into it.
> >
> >> Hm I think I'd rather make such pages really unmovable (by a pin?) than deny the
> >> whole CMA area to compaction. Would it be feasible?
> >
> > Well, it's an option too, however I'm not sure it's the best one.
> > First, because these pages can be moved quite often, making
> > them completely unmovable will make the compaction less efficient.
> > Second, because it's not only about these pages, but in general
> > about migrating pages into a cma area without a clear need.
> >
> > As I wrote in the commit log, if a cma area is located close to end
> > of a node (which seems to be default at least on x86 without a movable
> > zone), compaction can fill it quite aggressively. And it might bring
> > some hot pages (e.g. executable pagecache pages), which will cause
> > cma allocation failures. I've seen something like this in our prod.
>
> Hmm, I see.
>
> >>
> >> > Compaction moves them to the hugetlb_cma area, and then sometimes
> >> > the cma allocator fails to move them back from the cma area. It
> >> > results in failures of gigantic hugepages allocations.
> >> >
> >> > Also in general cma areas are reserved close to the end of a zone,
> >> > and it's where compaction tries to migrate pages. It means
> >> > compaction will aggressively fill cma areas, which makes not much
> >> > sense.
> >>
> >> So now the free page scanner will have to skip those areas, which is not much
> >> effective. But I suspect a worse problem in __compaction_suitable() which will
> >> now falsely report that there are enough free pages, so compaction will start
> >> but fail to do anytning. Minimally the __zone_watermark_ok() check there would
> >> have to lose ALLOC_CMA, but there might be other similar checks that would need
> >> adjusting.
> >
> > A really good point! I've looked around for any other checks, but haven't found
> > anything. Please, find an updated version of the patch below.
>
> Technically there's also __isolate_free_page() using ALLOC_CMA for watermark
> check, but it's shared by compaction and alloc_contig_range(), so we can't just
> remove ALLOC_CMA from there. It's probably not worth to complicate it though. If
> we pass compaction_suitable() without ALLOC_CMA and then reach
> __isolate_free_page() and meanwhile watermarks changed so we wouldn't pass
> without ALLOC_CMA anymore, it should be rare hopefully and not cause us deplete
> non-CMA free pages too badly.
>
> But I've only now also realized how dynamic setting cc->cma is. So in case a
> zone consists mostly of CMA blocks, removing ALLOC_CMA in
> __compaction_suitable() would be actually wrong and prevent compaction from
> doing any work? Sigh. Any idea about that?

Hm, idk, is it a realistic setup? Looks like it depends significantly on
the exact usecase.

Another option is to move the cma area closer to the beginning of a zone.

>
> >>
> >> And long-term what happens if the "CMA using ZONE_MOVABLE" approach is merged
> >> and there are not more CMA migratetypes to test? Might this change actually also
> >> avoid your issue, as said pages without __GFP_MOVABLE won't end up in a
> >> ZONE_MOVABLE?
> >
> > Yeah, this is what I was thinking about. Basically I want to mimic this behavior
> > right now. Once this approach will be implemented and merged, it will happen
> > automatically: obviously, compaction won't move pages between different zones.

After the second thought it's not so obvious: CMA would need to migrate pages
(data) between zones, right? It might bring some other complications.

>
> That will be much better. Can't wait, then :)

Yeah, if it will happen soon-ish, we can just wait. I just don't know
how hard it is and how many edge cases are there to be figured out first.

Do you think that it's better to wait and do not merge this patch upstream?

Thanks!