Re: [PATCH 3/5] mm: page_alloc: defrag_mode

From: Johannes Weiner
Date: Sat Mar 22 2025 - 23:47:15 EST


On Sat, Mar 22, 2025 at 09:34:09PM -0400, Johannes Weiner wrote:
> On Sat, Mar 22, 2025 at 08:58:27PM -0400, Johannes Weiner wrote:
> > On Sat, Mar 22, 2025 at 04:05:52PM +0100, Brendan Jackman wrote:
> > > On Thu Mar 13, 2025 at 10:05 PM CET, Johannes Weiner wrote:
> > > > + /* Reclaim/compaction failed to prevent the fallback */
> > > > + if (defrag_mode) {
> > > > + alloc_flags &= ALLOC_NOFRAGMENT;
> > > > + goto retry;
> > > > + }
> > >
> > > I can't see where ALLOC_NOFRAGMENT gets cleared, is it supposed to be
> > > here (i.e. should this be ~ALLOC_NOFRAGMENT)?
>
> Please ignore my previous email, this is actually a much more severe
> issue than I thought at first. The screwed up clearing is bad, but
> this will also not check the flag before retrying, which means the
> thread will retry reclaim/compaction and never reach OOM.
>
> This code has weeks of load testing, with workloads fine-tuned to
> *avoid* OOM. A blatant OOM test shows this problem immediately.
>
> A simple fix, but I'll put it through the wringer before sending it.

Ok, here is the patch. I verified this with intentional OOMing 100
times in a loop; this would previously lock up on first try in
defrag_mode, but kills and recovers reliably with this applied.

I also re-ran the full THP benchmarks, to verify that erroneous
looping here did not accidentally contribute to fragmentation
avoidance and thus THP success & latency rates. They were in fact not;
the improvements claimed for defrag_mode are unchanged with this fix:

VANILLA defrag_mode=1-OOMFIX
Hugealloc Time mean 52739.45 ( +0.00%) 27342.44 ( -48.15%)
Hugealloc Time stddev 56541.26 ( +0.00%) 33227.16 ( -41.23%)
Kbuild Real time 197.47 ( +0.00%) 196.32 ( -0.58%)
Kbuild User time 1240.49 ( +0.00%) 1231.89 ( -0.69%)
Kbuild System time 70.08 ( +0.00%) 58.75 ( -15.95%)
THP fault alloc 46727.07 ( +0.00%) 62669.93 ( +34.12%)
THP fault fallback 21910.60 ( +0.00%) 5966.40 ( -72.77%)
Direct compact fail 195.80 ( +0.00%) 50.53 ( -73.81%)
Direct compact success 7.93 ( +0.00%) 4.07 ( -43.28%)
Compact daemon scanned migrate 3369601.27 ( +0.00%) 1588238.93 ( -52.87%)
Compact daemon scanned free 5075474.47 ( +0.00%) 1441944.27 ( -71.59%)
Compact direct scanned migrate 161787.27 ( +0.00%) 64838.53 ( -59.92%)
Compact direct scanned free 163467.53 ( +0.00%) 37243.00 ( -77.22%)
Compact total migrate scanned 3531388.53 ( +0.00%) 1653077.47 ( -53.19%)
Compact total free scanned 5238942.00 ( +0.00%) 1479187.27 ( -71.77%)
Alloc stall 2371.07 ( +0.00%) 553.00 ( -76.64%)
Pages kswapd scanned 2160926.73 ( +0.00%) 4052539.93 ( +87.54%)
Pages kswapd reclaimed 533191.07 ( +0.00%) 765447.47 ( +43.56%)
Pages direct scanned 400450.33 ( +0.00%) 358933.93 ( -10.37%)
Pages direct reclaimed 94441.73 ( +0.00%) 26991.60 ( -71.42%)
Pages total scanned 2561377.07 ( +0.00%) 4411473.87 ( +72.23%)
Pages total reclaimed 627632.80 ( +0.00%) 792439.07 ( +26.26%)
Swap out 47959.53 ( +0.00%) 128511.80 ( +167.96%)
Swap in 7276.00 ( +0.00%) 27736.20 ( +281.16%)
File refaults 138043.00 ( +0.00%) 206198.40 ( +49.37%)

Many thanks for your careful review, Brendan.

---