Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT

From: Eric B Munson
Date: Tue Aug 25 2015 - 15:03:10 EST

On Tue, 25 Aug 2015, Michal Hocko wrote:

> On Tue 25-08-15 10:29:02, Eric B Munson wrote:
> > On Tue, 25 Aug 2015, Michal Hocko wrote:
> [...]
> > > Considering the current behavior I do not thing it would be terrible
> > > thing to do what Konstantin was suggesting and populate only the full
> > > ranges in a best effort mode (it is done so anyway) and document the
> > > behavior properly.
> > > "
> > > If the memory segment specified by old_address and old_size is
> > > locked (using mlock(2) or similar), then this lock is maintained
> > > when the segment is resized and/or relocated. As a consequence,
> > > the amount of memory locked by the process may change.
> > >
> > > If the range is already fully populated and the range is
> > > enlarged the new range is attempted to be fully populated
> > > as well to preserve the full mlock semantic but there is no
> > > guarantee this will succeed. Partially populated (e.g. created by
> > > mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic
> > > so they are not populated on resize.
> > > "
> >
> > You are proposing that mremap would scan the PTEs as Vlastimil has
> > suggested?
> As Vlastimil pointed out this would be unnecessarily too costly. But I
> am wondering whether we should populate at all during mremap considering
> the full mlock semantic is not guaranteed anyway. Man page mentions only
> that the lock is maintained which will be true without population as
> well.
> If somebody really depends on the current (and broken) implementation we
> can offer MREMAP_POPULATE which would do a best effort population. This
> would be independent on the locked state and would be usable for other
> mappings as well (the usecase would be to save page fault overhead by
> batching them).
> If this would be seen as an unacceptable user visible change of behavior
> then we can go with the VMA flag but I would still prefer to not export
> it to the userspace so that we have a way to change this in future.

Would you drop your objections to the VMA flag if I drop the portions of
the patch that expose it to userspace?

The rework to not use the VMA flag is pretty sizeable and is much more
ugly IMO. I know that you are not wild about using bit 30 of 32 for
this, but perhaps we can settle on not exporting it to userspace so we
can reclaim it if we really need it in the future? I can teach the
folks here to check for size vs RSS of the locked mappings for stats on
lock on fault usage so from my point of view, the proc changes are not

> --
> Michal Hocko
> SUSE Labs

