Re: [RFC PATCH 0/3] support large folio for mlock

From: Matthew Wilcox
Date: Sat Jul 08 2023 - 00:02:24 EST


On Sat, Jul 08, 2023 at 11:52:23AM +0800, Yin, Fengwei wrote:
> > Oh, I agree, there are always going to be circumstances where we realise
> > we've made a bad decision and can't (easily) undo it. Unless we have a
> > per-page pincount, and I Would Rather Not Do That. But we should _try_
> > to do that because it's the right model -- that's what I meant by "Tell
> > me why I'm wrong"; what scenarios do we have where a user temporarilly
> > mlocks (or mprotects or ...) a range of memory, but wants that memory
> > to be aged in the LRU exactly the same way as the adjacent memory that
> > wasn't mprotected?
> for manpage of mlock():
> mlock(), mlock2(), and mlockall() lock part or all of the calling process's virtual address space into RAM, preventing that memory
> from being paged to the swap area.
>
> So my understanding is it's OK to let the memory mlocked to be aged with
> the adjacent memory which is not mlocked. Just make sure they are not
> paged out to swap.

Right, it doesn't break anything; it's just a similar problem to
internal fragmentation. The pages of the folio which aren't mlocked
will also be locked in RAM and never paged out.

> One question for implementation detail:
> If the large folio cross VMA boundary can not be split, how do we
> deal with this case? Retry in syscall till it's split successfully?
> Or return error (and what ERRORS should we choose) to user space?

I would be tempted to allocate memory & copy to the new mlocked VMA.
The old folio will go on the deferred_list and be split later, or its
valid parts will be written to swap and then it can be freed.