Re: [PATCH v2] mm/vmscan: skip increasing kswapd_failures when reclaim was boosted

From: Jiayuan Chen

Date: Thu Nov 13 2025 - 21:23:56 EST


2025/11/14 03:28, "Shakeel Butt" <shakeel.butt@xxxxxxxxx mailto:shakeel.butt@xxxxxxxxx?to=%22Shakeel%20Butt%22%20%3Cshakeel.butt%40linux.dev%3E > wrote:


>
> On Thu, Nov 13, 2025 at 11:02:41AM +0100, Michal Hocko wrote:
>
> >
> > In general I think not incrementing the failure for boosted kswapd
> > iteration is right. If this issue (high protection causing kswap
> > failures) happen on non-boosted case, I am not sure what should be right
> > behavior i.e. allocators doing direct reclaim potentially below low
> > protection or allowing kswapd to reclaim below low. For min, it is very
> > clear that direct reclaimer has to reclaim as they may have to trigger
> > oom-kill. For low protection, I am not sure.
> >
> > Our current documention gives us some room for interpretation. I am
> > wondering whether we need to change the existing implemnetation though.
> > If kswapd is not able to make progress then we surely have direct
> > reclaim happening. So I would only change this if we had examples of
> > properly/sensibly configured systems where kswapd low limit breach could
> > help to reuduce stalls (improve performance) while the end result from
> > the amount of reclaimed memory would be same/very similar.
> >
> Yes, I think any change here will need much more brainstorming and
> experimentation. There are definitely corner cases which the right
> solution might not be in kernel. One such case I was thinking about is
> unbalanced (memory) numa node where I don't think kswapd of that node
> should do anything because of the disconnect between numa memory usage
> and memcg limits. On such cases either numa balancing or
> promotion/demotion systems under discussion would be more appropriate.
> Anyways this is orthogonal.

Can I ask for a link or some keywords to search the mailing list regarding the NUMA
imbalance you mentioned?

I'm not sure if it's similar to a problem I encountered before. We have a system
with 2 nodes and swap is disabled. After running for a while, we found that anonymous
pages occupied over 99% of one node. When kswapd on that node runs, it continuously tries
to reclaim the 1% file pages. However, these file pages are mostly code pages and are hot,
leading to frenzied refaults, which eventually causes sustained high read I/O load on the disk.