答复: 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing

From: Yibin Liu

Date: Thu Apr 23 2026 - 21:09:51 EST

> On Wed, Apr 22, 2026 at 12:51:06PM +0000, Yibin Liu wrote:
> > First of all, I am truly sorry for not using RFC.
> > Secondly, I omitted many maintainers because I wanted to “not disturb too
> many people”,
> > and I apologize deeply for that. I will fully follow these two rules from now on.
> >
> > As for this patch, indeed, as Matthew said, the truncate part is not feasible.
> > My original intention was to apply this to frequently used library files like libc
> and ld.
> > Contention on the i_mmap_rwsem lock (which eventually turns into osq_lock)
> caused by
> > these two files alone reaches up to 70% in the “256-core execl” case, as
> observed from
> > flame graphs. Besides, no one performs truncate operations on libc and ld
> anyway.
>
> Interesting, would be good to see these? And more details on the scenario?
>
> What workloads are contending that exactly?
>

It is good to see.
On an Intel Emerald Rapids server (112 cores), run the execl benchmark from
UnixBench with the command: ./Run -c 220 execl
Then perf top shows:

91.53% [kernel] [k] osq_lock
0.50% [kernel] [k] rwsem_spin_on_owner
0.45% perf [.] queue_event
0.42% [kernel] [k] vma_interval_tree_insert
0.36% [kernel] [k] next_uptodate_folio
0.25% [kernel] [k] __zap_vma_range

All the osq_lock overhead here comes from rwsem_optimistic_spin, and
rwsem_optimistic_spin has many call sources.
The breakdown is roughly as follows:

6.13% _dl_main-->mprotect-->...-->__split_vma-->vma_prepare-->down_write(&mapping->i_mmap_rwsem)
6.15% bprm_execve-->...-->exit_mmap-->...-->unlink_file_vma_batch_process-->down_write(&mapping->i_mmap_rwsem)
24.71% vma_link_file-->...-->down_write(&mapping->i_mmap_rwsem)
24.82% mmap_region-->...-->free_pgtalbes-->unlink_file_vma_batch_process-->down_write(&mapping->i_mmap_rwsem)
18.5% mmap_region-->...-->__split_vma->vma_preapre-->down_write(&mapping->i_mmap_rwsem)
12.44% _dl_map_project-->mprotect-->...-->__split_vma-->vma_prepare-->down_write(&mapping->i_mmap_rwsem)

And AMD Zen5(9755) performs pretty much the same way (tested with ./Run -c 250 execl).

> >
> > So I wanted to try skipping rmap for them. Since they are small, even if they
> cannot
> > be reclaimed or migrated, I assumed it would not cause much trouble. Of
> course,
> > this idea was totally wrong, and I will definitely mark such insane proposals
> with RFC in the future.
> >
> > These ideas are inspired by Mateusz’s work and thoughts
> >
> (https://lore.kernel.org/linux-mm/CAGudoHEfiOPJ2VGEV3fDT9cDsuoHB-wk8jg-k
> -EK6JhWgiHkWw@xxxxxxxxxxxxxx/),
> > so I specifically CC’d him to seek more opinions and insights.
>
> I think the best thing in general going forwards is to bring up this issues in
> advance, we're more than happy to look into things and very interested in issues
> with lock contention, latency, etc.
>
> And that way you can discuss ideas you might have to tackle up front and we can
> give you early feedback, which should save time all round and help get us to a
> good solution :)
>
> Just send with a [DISCUSSION] preface and cc- people you feel are relevant (use
> MAINTAINERS to figure out e.g. maintainers of relevant things, like rmap,
> mmap,
> etc.)
>
> >
> > Lastly, I sincerely apologize for the trouble I have caused the community.
> > I will strictly follow community conventions when sending patches in the
> future.
>
> It's no problem, better to be direct about this - it's more useful to discuss
> rather than to jump to a solution without community involvement, which might
> not
> work out/conflict with other stuff etc.
>
> Thanks, Lorenzo

Thanks for the kind advice.

I will start a discussion first with a [DISCUSSION] tag and involve relevant
maintainers for similar ideas in the future.

Thanks, Yibin