Re: [PATCH] mm/lruvec: preemptively free dead folios during lru_add drain

From: JP Kobryn (Meta)

Date: Thu Apr 23 2026 - 21:47:49 EST

On 4/23/26 4:53 PM, Barry Song wrote:

On Fri, Apr 24, 2026 at 7:46 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:

On Fri, Apr 24, 2026 at 07:22:30AM +0800, Barry Song wrote:

On Fri, Apr 24, 2026 at 12:43 AM JP Kobryn (Meta) <jp.kobryn@xxxxxxxxx> wrote:

Of all observable lruvec lock contention in our fleet, we find that ~24%
occurs when dead folios are present in lru_add batches at drain time. This
is wasteful in the sense that the folio is added to the LRU just to be
immediately removed via folios_put_refs(), incurring two unnecessary lock
acquisitions.

Eliminate this overhead by preemptively cleaning up dead folios before they
make it into the LRU. Use folio_ref_freeze() to filter folios whose only
remaining refcount is the batch ref. When dead folios are found, move them
off the add batch and onto a temporary batch to be freed.

During A/B testing on one of our prod instagram workloads (high-frequency
short-lived requests), the patch intercepted almost all dead folios before
they entered the LRU. Data collected using the mm_lru_insertion tracepoint
shows the effectiveness of the patch:

Per-host LRU add averages at 95% CPU load
(60 hosts each side, 3 x 60s intervals)

dead folios/min total folios/min dead %
unpatched: 1,297,785 19,341,986 6.7097%
patched: 14 19,039,996 0.0001%

Within this workload, we save ~2.6M lock acquisitions per minute per host
as a result.

System-wide memory stats improved on the patched side also at 95% CPU load:
- direct reclaim scanning reduced 7%
- allocation stalls reduced 5.2%
- compaction stalls reduced 12.3%
- page frees reduced 4.9%

No regressions were observed in requests served per second or request tail
latency (p99). Both metrics showed directional improvement at higher CPU
utilization (comparing 85% to 95%).

Signed-off-by: JP Kobryn (Meta) <jp.kobryn@xxxxxxxxx>

Hi JP,
I’m seeing a large number of "BAD page" bugs.
Not sure if it’s related, but reverting this patch
seems to fix the issue.

It seems this was missed since classic LRU was used in testing.

[ 2869.365978] BUG: Bad page state in process uname pfn:3a5417
[ 2869.365981] page: refcount:0 mapcount:0 mapping:0000000000000000
index:0x724884c20 pfn:0x3a5417
[ 2869.365983] flags:
0x17ffffc0020908(uptodate|active|owner_2|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)

Hi Barry, are you using MGLRU? It seems like MGLRU set active flag in
folio_add_lru().

Yes. If you are referring to this set_active, I think it is
incorrect, so I have fixed it here and am waiting for review:

https://lore.kernel.org/linux-mm/20260418120233.7162-1-baohua@xxxxxxxxxx/

JP, we need to clean active flag but let's check what else can be set before
folio_add_lru().

Looks like only active is the problem. If we start manually clearing
flags it starts to feel messy. I get that some fix is needed though. I
don't see this patch in mm-new yet so maybe we can hold off on merging
there to avoid the MGLRU case. But if Barry's patch is accepted, could
we re-apply?

Let me know if you're thinking there are any implications beyond the
active flag.