Re: [PATCH] mm/page_alloc: Occasionally relinquish zone lock in batch freeing

From: Joshua Hahn
Date: Wed Aug 20 2025 - 11:28:15 EST


On Wed, 20 Aug 2025 09:29:00 +0800 Hillf Danton <hdanton@xxxxxxxx> wrote:

Hello Hillf, thank you for your review!

> On Mon, 18 Aug 2025 11:58:03 -0700 Joshua Hahn wrote:
> >
> > While testing workloads with high sustained memory pressure on large machines
> > (1TB memory, 316 CPUs), we saw an unexpectedly high number of softlockups.
> > Further investigation showed that the lock in free_pcppages_bulk was being held
> > for a long time, even being held while 2k+ pages were being freed.
> >
> > Instead of holding the lock for the entirety of the freeing, check to see if
> > the zone lock is contended every pcp->batch pages. If there is contention,
> > relinquish the lock so that other processors have a change to grab the lock
> > and perform critical work.
> >
> Instead of the unlock/lock game, simply return with the rest left to workqueue
> in case of lock contension. But workqueue is still unable to kill soft lockup
> if the number of contending CPUs is large enough.

Thank you for the idea. One concern that I have is that sometimes, we do expect
free_pcppages_bulk to actually free all of the pages that it has promised to
do. One example is when it is called from drain_zone_pages. Of course, we can
have a while loop that would call free_pcppages_bulk until it returns 0, but
I think that would be reduced to unlocking / locking over and over again.

As for the number of contending CPUs -- I'm not really sure what the number
looks like. In my testing, I have just done some spot checks to see that the
zone lock is indeed contended, but I'm not entirely sure how hotly it is
contended. I can run some tests before sending out the next version to see if
it is higher / lower than expected.

Thank you, I hope you have a great day!
Joshua