Re: [PATCH v2] mm: don't be stuck to rmap lock on reclaim path

From: Minchan Kim
Date: Tue May 10 2022 - 12:53:04 EST


On Mon, May 09, 2022 at 11:54:49AM -0700, Andrew Morton wrote:
> On Mon, 9 May 2022 08:47:10 -0700 Minchan Kim <minchan@xxxxxxxxxx> wrote:
>
> > The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be
> > contented under memory pressure if processes keep working on
> > their vmas(e.g., fork, mmap, munmap). It makes reclaim path
> > stuck. In our real workload traces, we see kswapd is waiting the
> > lock for 300ms+(a sec as worst case) and it makes other processes
> > entering direct reclaim, which were also stuck on the lock.
> >
> > This patch makes LRU aging path try_lock mode like shink_page_list
> > so the reclaim context will keep working with next LRU pages
> > without being stuck.
> >
> > Since this patch introduces a new "contended" field as out-param
> > along with try_lock in-param in rmap_walk_control, it's not
> > immutable any longer if the try_lock is set so remove const
> > keywords on rmap related functions. Since rmap walking is already
> > expensive operation, I doubt the const would help sizable benefit(
> > And we didn't have it until 5.17).
>
> Some quantitative testing results would be helpful. Demonstrate
> the benefits of the patch?

In a heavy app workload in Android, trace shows following statistics.
It removes almost of lock contention from those rmap.

Before:

max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function
1632 0 1631 151.542173 31672 209 page_lock_anon_vma_read
601 0 601 145.544681 28817 198 rmap_walk_file

After:

max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function
NaN NaN NaN NaN NaN 0.0 NaN
0 0 0 0.127645 1 12 rmap_walk_file


I will include this data in the description.