Re: [PATCH 00/13] mm/munlock: rework of mlock+munlock page handling

From: Michal Hocko
Date: Wed Feb 09 2022 - 10:36:31 EST


On Sun 06-02-22 13:27:41, Hugh Dickins wrote:
> I wondered whether to post this munlocking rework in
> https://lore.kernel.org/linux-mm/35c340a6-96f-28a0-2b7b-2f9fbddc01f@xxxxxxxxxx/
>
> There the discussion was OOM reaping, but the main reason for the rework
> has been catastrophic contention on i_mmap_rwsem when exiting from
> multiply mlocked files; and frustration with how contorted munlocking is.

yeah, I do not think that too many care about oom_reaper stumbling over
vast portions of the mlocked memory problem. So far I have seen only LTP
oom02 hitting this and that one does mlockall and intentionally hits
OOM. So far I haven't seen any actual workload hiting something similar.
But hey, if we get a better and easier to maintain mlock code then the
oom_reaper will surely add to a huge thanks.

> Here it is based on 5.17-rc2, applies also to -rc3, almost cleanly to
> mmotm 2022-02-03-21-58 (needs two easy fixups in mm/huge_memory.c); but
> likely to conflict (I hope not fundamentally) with several concurrent
> large patchsets.
>
> tl;dr
> mm/mlock.c | 634 +++++++++++++++-----------------------
> 23 files changed, 510 insertions(+), 779 deletions(-)

This is really impressive!

> In my own opinion, this is ready to go in: that whatever its defects,
> it's a big practical improvement over what's presently there. Others
> may differ; and it may be too much to arrive at a quick opinion on.
> My hope is that it will strike a chord with some who have laboured in
> this area in the past: I'll be short of time to do much more with it,
> so maybe someone else could take over if necessary.
>
> At present there's no update to Documentation/vm/unevictable-lru.rst:
> that always needs a different mindset, can follow later, please refer
> to commit messages for now.

Yes, that would help for sure.

So far I have only managed to read through the series and trying to put
all the pieces together (so far I have given up on the THP part) and my
undestanding is far from complete. But I have to say I like the general
approach and overall simplification.

The only thing that is not entirely clear to me at the moment is why you
have chosen to ignore already mapped LOCKONFAULT pages. They will
eventually get sorted out during the reclaim/migration but this can
backfire if too many pages have been pre-faulted before LOCKONFAULT
call. Maybe not an interesting case in the first place but I am still
wondering why you have chosen that way.

I will be off next couple of days and plan to revisit this afterwards
(should time allow). Anyway thanks a lot Hugh!
--
Michal Hocko
SUSE Labs