Re: [PATCH] mm/page_alloc: Occasionally relinquish zone lock in batch freeing

From: Andrew Morton
Date: Wed Aug 20 2025 - 01:42:15 EST


On Mon, 18 Aug 2025 11:58:03 -0700 Joshua Hahn <joshua.hahnjy@xxxxxxxxx> wrote:

> While testing workloads with high sustained memory pressure on large machines
> (1TB memory, 316 CPUs), we saw an unexpectedly high number of softlockups.
> Further investigation showed that the lock in free_pcppages_bulk was being held
> for a long time, even being held while 2k+ pages were being freed.

It would be interesting to share some of those softlockup traces.

We have this CONFIG_PCP_BATCH_SCALE_MAX which appears to exist to
address precisely this issue. But only about half of the
free_pcppages_bulk() callers actually honor it.

So perhaps the fix is to fix the callers which forgot to implement this?

- decay_pcp_high() tried to implement CONFIG_PCP_BATCH_SCALE_MAX, but
that code hurts my brain.

- drain_pages_zone() implements it but, regrettably, doesn't use it
to periodically release pcp->lock. Room for improvement there.