[RFC] mm: stress-ng --mremap triggers severe lruvec lock contention in populate/unmap paths

From: Joseph Salisbury

Date: Tue Apr 07 2026 - 16:10:09 EST


Hello,

I would like to ask for feedback on an MM performance issue triggered by stress-ng's mremap stressor:

stress-ng --mremap 8192 --mremap-bytes 4K --timeout 30 --metrics-brief

This was first investigated as a possible regression from 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap"), but the current evidence suggests that commit is mostly exposing an older problem for this workload rather than directly causing it.


Observed behavior:

The metrics below are in this format:
    stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
                             (secs)    (secs)    (secs)   (real time) (usr+sys time)

On a 5.15-based kernel, the workload behaves much worse when swapping is disabled:

    swap enabled:
      mremap 1660980 31.08 64.78 84.63 53437.09 11116.73

    swap disabled:
      mremap 40786258 27.94 15.41 15354.79 1459749.43 2653.59

On a 6.12-based kernel with swap enabled, the same high-system-time behavior is also observed:

    mremap 77087729 21.50 29.95 30558.08 3584738.22 2520.19

A recent 7.0-rc5-based mainline build still behaves similarly:

    mremap 39208813 28.12 12.34 15318.39 1394408.50 2557.53

So this does not appear to be already fixed upstream.



The current theory is that 0ca0c24e3211 exposes this specific zero-page-heavy workload.  Before that change, swap-enabled runs actually swapped pages.  After that change, zero pages are stored in the swap bitmap instead, so the workload behaves much more like the swap-disabled case.

Perf data supports the idea that the expensive behavior is global LRU lock contention caused by short-lived populate/unmap churn.

The dominant stacks on the bad cases include:

    vm_mmap_pgoff
      __mm_populate
        populate_vma_page_range
          lru_add_drain
            folio_batch_move_lru
              folio_lruvec_lock_irqsave
                native_queued_spin_lock_slowpath

and:

    __x64_sys_munmap
      __vm_munmap
        ...
          release_pages
            folios_put_refs
              __page_cache_release
                folio_lruvec_relock_irqsave
                  native_queued_spin_lock_slowpath



It was also found that adding '--mremap-numa' changes the behavior substantially:

stress-ng --mremap 8192 --mremap-bytes 4K --timeout 30 --mremap-numa --metrics-brief

mremap 2570798 29.39 8.06 106.23 87466.50 22494.74

So it's possible that either actual swapping, or the mbind(..., MPOL_MF_MOVE) path used by '--mremap-numa', removes most of the excessive system time.

Does this look like a known MM scalability issue around short-lived MAP_POPULATE / munmap churn?




REPRODUCER:
The issue is reproducible with stress-ng's mremap stressor:

stress-ng --mremap 8192 --mremap-bytes 4K --timeout 30 --metrics-brief

On older kernels, the bad behavior is easiest to expose by disabling swap first:

swapoff -a
stress-ng --mremap 8192 --mremap-bytes 4K --timeout 30 --metrics-brief

On kernels with 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap") or newer, the same bad behavior can be seen even with swap enabled, because this zero-page-heavy workload no longer actually swaps pages and behaves much like the swap-disabled case.

Typical bad-case behaviour:
 - Very large aggregate sys time during a 30s run (for example, ~15000s or higher)
 - Poor bogo ops/s measured against usr+sys time (~2500 range in our tests)
 - Perf shows time dominated by:
      vm_mmap_pgoff -> __mm_populate -> populate_vma_page_range -> lru_add_drain
    and
      munmap -> release_pages -> __page_cache_release
   with heavy time in folio_lruvec_lock_irqsave/native_queued_spin_lock_slowpath

Diagnostic variant:
stress-ng --mremap 8192 --mremap-bytes 4K --timeout 30 --mremap-numa --metrics-brief

That variant greatly reduces the excessive system time, which is one of the clues that the excessive system-time overhead depends on which MM path the workload takes.


Thanks in advance!

Joe