Re: [PATCH v8 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

From: Alistair Popple
Date: Thu May 20 2021 - 22:23:51 EST


On Friday, 21 May 2021 6:24:27 AM AEST Liam Howlett wrote:

[...]

> > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > index 977e70803ed8..f09d522725b9 100644
> > > > --- a/mm/rmap.c
> > > > +++ b/mm/rmap.c
> > > > @@ -1405,10 +1405,6 @@ static bool try_to_unmap_one(struct page *page,
> > > > struct vm_area_struct *vma,>
> > > >
> > > > struct mmu_notifier_range range;
> > > > enum ttu_flags flags = (enum ttu_flags)(long)arg;
> > > >
> > > > - /* munlock has nothing to gain from examining un-locked vmas */
> > > > - if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
> > > > - return true;
> > > > -
> > > >
> > > > if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
> > > >
> > > > is_zone_device_page(page) && !is_device_private_page(page))
> > > >
> > > > return true;
> > > >
> > > > @@ -1469,8 +1465,6 @@ static bool try_to_unmap_one(struct page *page,
> > > > struct vm_area_struct *vma,>
> > > >
> > > > page_vma_mapped_walk_done(&pvmw);
> > > > break;
> > > >
> > > > }
> > > >
> > > > - if (flags & TTU_MUNLOCK)
> > > > - continue;
> > > >
> > > > }
> > > >
> > > > /* Unexpected PMD-mapped THP? */
> > > >
> > > > @@ -1784,8 +1778,39 @@ bool try_to_unmap(struct page *page, enum
> > > > ttu_flags
> > > > flags)>
> > > >
> > > > return !page_mapcount(page) ? true : false;
> > > >
> > > > }
> > >
> > > Please add a comment here, especially around locking.
>
> Did you miss this comment? I think the name confusion alone means this
> needs some documentation. It's also worth mentioning arg is unused.

Ack. Was meant to come back to that after discussing some of the locking
questions below. The other side effect of splitting this code out is it leaves
space for more specific documentation which is only a good thing. I will try
and summarise some of the discussion below into a comment here.

> > > > +static bool page_mlock_one(struct page *page, struct vm_area_struct
> > > > *vma,
> > > > + unsigned long address, void *arg)
> > > > +{
> > > > + struct page_vma_mapped_walk pvmw = {
> > > > + .page = page,
> > > > + .vma = vma,
> > > > + .address = address,
> > > > + };
> > > > +
> > > > + /* munlock has nothing to gain from examining un-locked vmas */
> > > > + if (!(vma->vm_flags & VM_LOCKED))
> > > > + return true;
> > >
> > > The logic here doesn't make sense. You called page_mlock_one() on a VMA
> > > that isn't locked and it returns true? Is this a check to see if the
> > > VMA has zero mlock'ed pages?
> >
> > I'm pretty sure the logic is correct. This is used for an rmap_walk, so we
> > return true to continue the page table scan to see if other VMAs have the
> > page locked.
>
> yes, sorry. The logic is correct but doesn't read as though it does.
> I cannot see what is going on easily and there are no comments stating
> what is happening.

Thanks for confirming. The documentation in Documentation/vm/unevictable-
lru.rst is helpful for higher level context but I will put some comments here
around the logic.

> > > > +
> > > > + while (page_vma_mapped_walk(&pvmw)) {
> > > > + /* PTE-mapped THP are never mlocked */
> > > > + if (!PageTransCompound(page)) {
> > > > + /*
> > > > + * Holding pte lock, we do *not* need
> > > > + * mmap_lock here
> > > > + */
> > >
> > > Are you sure? I think you at least need to hold the mmap lock for
> > > reading to ensure there's no race here? mlock_vma_page() eludes to such
> > > a scenario when lazy mlocking.
> >
> > Not really. I don't claim to be an mlock expert but as this is a clean-up
> > for try_to_unmap() the intent was to not change existing behaviour.
> >
> > However presenting the function in this simplified form did raise this and
> > some other questions during previous reviews - see
> > https://lore.kernel.org/
> > dri-devel/20210331115746.GA1463678@xxxxxxxxxx/ for the previous
> > discussion.
>
> From what I can see, at least the following paths have mmap_lock held
> for writing:
>
> munlock_vma_pages_range() from __do_munmap()
> munlokc_vma_pages_range() from remap_file_pages()
>
> > To answer the questions around locking though I did do some git sha1
> > mining. The best explanation seemed to be contained in
> > https://git.kernel.org/pub/scm/
> > linux/kernel/git/torvalds/linux.git/commit/?
> > id=b87537d9e2feb30f6a962f27eb32768682698d3b from Hugh (whom I've added
> > again here in case he can help answer some of these).
>
> Thanks for the pointer. That race doesn't make the lock unnecessary.
> It is the exception to the rule because the exit of the process will
> correct what has happened - and in all other cases, the mmap_lock is
> held already (or ought to be) at a higher level. I think this comment
> is just wrong and should be removed as it is confusing. I see that this
> is a copy from elsewhere, but it's at least wrong in this context.

Ha, thanks. I had just copied the comment without checking to see if mmap_lock
was always held by callers of page_mlock() or not. From what I can tell it is
which confirms what you are saying. Thanks.

> > > The mmap_lock is held for writing in the scenarios I have checked.
> > >
> > > > + mlock_vma_page(page);
> > > > + }
> > > > + page_vma_mapped_walk_done(&pvmw);
> > > > +
> > > > + /* found a mlocked page, no point continuing munlock
> > > > check
> > > > */
> > >
> > > I think you need to check_pte() to be sure it is mapped?
>
> On second thought, my comment here is wrong.

Thanks, I had meant to question that in my last reply.

> > > > + return false;
> > > > + }
> > > > +
> > > > + return true;
> > >
> > > Again, I don't get the return values. If page_mlock_one() returns true,
> > > I'd expect for my page to now be locked. This isn't the case here,
> > > page_mlock_one() returns true if there are no pages present for a locked
> > > VMA, correct?
> >
> > This is used for an rmap walk, if we were able to mlock the page we return
> > false to stop the walk as there is no need to examine other VMAs if the
> > page has already been mlocked.
>
> My confusion is about the function name. The code is not
> self-documenting as it is written and there are no comments. Maybe add
> a comment block saying what it returns and why?

Fair enough, I agree it isn't obvious from just reading the code so will
incorporate a summary from the unevictable-lru documentation (which would have
been helpful to have had referenced originally anyway).

> > > > +}
> > > > +
> > > >
> > > > /**
> > > >
> > > > - * try_to_munlock - try to munlock a page
> > > > + * page_mlock - try to munlock a page
> > >
> > > Is this an mlock or an munlock? I'm not confident it's either, but more
> > > of a check to see if there are pages mapped in a locked VMA?
> >
> > It is called (only) from the munlock code to check a page does not need to
> > be mlocked, or to mlock it if it does.
>
> I don't want to re-open the discussion on the name, but I just might
> with this comment: Do you think it is correct to have a comment for
> page_mlock that says "try to munlock a page"? Those seem like opposite
> functions to me. The @page also references the munlock'ing. This is
> ensuring a locked page that is also locked elsewhere will not be
> removed, right?

No, I don't think that is correct. Sorry for the confusion. I updated the name
with a script but obviously missed some of the comments. When reviewing this
for rebasing I noticed a few other places in the code still incorrectly
referencing this as an munlock so will go through and improve the comments
more generally.

I think a better description of what this function does and why might help
with the inevitable confusion that arises from munlock calling a function to
mlock pages.

> > > > * @page: the page to be munlocked
> > > > *
> > > > * Called from munlock code. Checks all of the VMAs mapping the page
> > > >
> > > > @@ -1793,11 +1818,10 @@ bool try_to_unmap(struct page *page, enum
> > > > ttu_flags flags)>
> > > >
> > > > * returned with PG_mlocked cleared if no other vmas have it mlocked.
> > > > */
> > > >
> > > > -void try_to_munlock(struct page *page)
> > > > +void page_mlock(struct page *page)
> > > >
> > > > {
> > > >
> > > > struct rmap_walk_control rwc = {
> > > >
> > > > - .rmap_one = try_to_unmap_one,
> > > > - .arg = (void *)TTU_MUNLOCK,
> > > > + .rmap_one = page_mlock_one,
> > > >
> > > > .done = page_not_mapped,
> > > > .anon_lock = page_lock_anon_vma_read,
> > > >
> > > > @@ -1849,7 +1873,7 @@ static struct anon_vma
> > > > *rmap_walk_anon_lock(struct
> > > > page *page,>
> > > >
> > > > * Find all the mappings of a page using the mapping pointer and the
> > > > vma
> > > > chains * contained in the anon_vma struct it points to.
> > > > *
> > > >
> > > > - * When called from try_to_munlock(), the mmap_lock of the mm
> > > > containing
> > > > the vma + * When called from page_mlock(), the mmap_lock of the mm
> > > > containing the vma>
> > > >
> > > > * where the page was found will be held for write. So, we won't
> > > > recheck
> > > > * vm_flags for that VMA. That should be OK, because that vma
> > > > shouldn't
> > > > be
> > > > * LOCKED.
> > > >
> > > > @@ -1901,7 +1925,7 @@ static void rmap_walk_anon(struct page *page,
> > > > struct
> > > > rmap_walk_control *rwc,>
> > > >
> > > > * Find all the mappings of a page using the mapping pointer and the
> > > > vma
> > > > chains * contained in the address_space struct it points to.
> > > > *
> > > >
> > > > - * When called from try_to_munlock(), the mmap_lock of the mm
> > > > containing
> > > > the vma + * When called from page_mlock(), the mmap_lock of the mm
> > > > containing the vma>
> > > >
> > > > * where the page was found will be held for write. So, we won't
> > > > recheck
> > >
> > > Above it is stated that the lock does not need to be held, but this
> > > comment says it is already held for writing - which is it?
> >
> > I will have to check.
> >
> > > > * vm_flags for that VMA. That should be OK, because that vma
> > > > shouldn't
> > > > be
> > > > * LOCKED.
> > > >
> > > > --
> > > > 2.20.1
> > >
> > > munlock_vma_pages_range() comments references try_to_{munlock|unmap}
> >
> > Thanks, I noticed that too when I was rereading it just now. Will fix that
> > up.>
> > - Alistair