RE: [RFC] mm: Proactive compaction
From: Nitin Gupta
Date: Fri Nov 22 2019 - 17:31:29 EST
> -----Original Message-----
> From: owner-linux-mm@xxxxxxxxx <owner-linux-mm@xxxxxxxxx> On Behalf
> Of David Rientjes
> Sent: Monday, September 16, 2019 1:17 PM
> To: Nitin Gupta <nigupta@xxxxxxxxxx>
> Cc: akpm@xxxxxxxxxxxxxxxxxxxx; vbabka@xxxxxxx;
> mgorman@xxxxxxxxxxxxxxxxxxx; mhocko@xxxxxxxx;
> dan.j.williams@xxxxxxxxx; Yu Zhao <yuzhao@xxxxxxxxxx>; Matthew Wilcox
> <willy@xxxxxxxxxxxxx>; Qian Cai <cai@xxxxxx>; Andrey Ryabinin
> <aryabinin@xxxxxxxxxxxxx>; Roman Gushchin <guro@xxxxxx>; Greg Kroah-
> Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>; Kees Cook
> <keescook@xxxxxxxxxxxx>; Jann Horn <jannh@xxxxxxxxxx>; Johannes
> Weiner <hannes@xxxxxxxxxxx>; Arun KS <arunks@xxxxxxxxxxxxxx>; Janne
> Huttunen <janne.huttunen@xxxxxxxxx>; Konstantin Khlebnikov
> <khlebnikov@xxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; linux-
> mm@xxxxxxxxx
> Subject: Re: [RFC] mm: Proactive compaction
>
> On Fri, 16 Aug 2019, Nitin Gupta wrote:
>
> > For some applications we need to allocate almost all memory as
> > hugepages. However, on a running system, higher order allocations can
> > fail if the memory is fragmented. Linux kernel currently does
> > on-demand compaction as we request more hugepages but this style of
> > compaction incurs very high latency. Experiments with one-time full
> > memory compaction (followed by hugepage allocations) shows that kernel
> > is able to restore a highly fragmented memory state to a fairly
> > compacted memory state within <1 sec for a 32G system. Such data
> > suggests that a more proactive compaction can help us allocate a large
> > fraction of memory as hugepages keeping allocation latencies low.
> >
> > For a more proactive compaction, the approach taken here is to define
> > per page-order external fragmentation thresholds and let kcompactd
> > threads act on these thresholds.
> >
> > The low and high thresholds are defined per page-order and exposed
> > through sysfs:
> >
> > /sys/kernel/mm/compaction/order-[1..MAX_ORDER]/extfrag_{low,high}
> >
> > Per-node kcompactd thread is woken up every few seconds to check if
> > any zone on its node has extfrag above the extfrag_high threshold for
> > any order, in which case the thread starts compaction in the backgrond
> > till all zones are below extfrag_low level for all orders. By default
> > both these thresolds are set to 100 for all orders which essentially
> > disables kcompactd.
> >
> > To avoid wasting CPU cycles when compaction cannot help, such as when
> > memory is full, we check both, extfrag > extfrag_high and
> > compaction_suitable(zone). This allows kcomapctd thread to stays
> > inactive even if extfrag thresholds are not met.
> >
> > This patch is largely based on ideas from Michal Hocko posted here:
> > https://lore.kernel.org/linux-
> mm/20161230131412.GI13301@xxxxxxxxxxxxxx
> > /
> >
> > Testing done (on x86):
> > - Set /sys/kernel/mm/compaction/order-9/extfrag_{low,high} = {25, 30}
> > respectively.
> > - Use a test program to fragment memory: the program allocates all
> > memory and then for each 2M aligned section, frees 3/4 of base pages
> > using munmap.
> > - kcompactd0 detects fragmentation for order-9 > extfrag_high and
> > starts compaction till extfrag < extfrag_low for order-9.
> >
> > The patch has plenty of rough edges but posting it early to see if I'm
> > going in the right direction and to get some early feedback.
> >
>
> Is there an update to this proposal or non-RFC patch that has been posted
> for proactive compaction?
>
I recently posted a non-RFC patch for proactive compaction:
https://lkml.org/lkml/2019/11/15/1099
Please let me know if you try it out or if you have any feedback.
Thanks,
Nitin
> We've had good success with periodically compacting memory on a regular
> cadence on systems with hugepages enabled. The cadence itself is defined
> by the admin but it causes khugepaged[*] to periodically wakeup and invoke
> compaction in an attempt to keep zones as defragmented as possible
> (perhaps more "proactive" than what is proposed here in an attempt to keep
> all memory as unfragmented as possible regardless of extfrag thresholds).
> It also avoids corner-cases where kcompactd could become more expensive
> than what is anticipated because it is unsuccessful at compacting memory yet
> the extfrag threshold is still exceeded.
>
> [*] Khugepaged instead of kcompactd only because this is only enabled
> for systems where transparent hugepages are enabled, probably better
> off in kcompactd to avoid duplicating work between two kthreads if
> there is already a need for background compaction.