Re: [PATCH v12 5/6] khugepaged: enable collapse pmd for pte-mapped THP

From: Oleg Nesterov
Date: Tue Aug 13 2019 - 12:24:56 EST


On 08/13, Kirill A. Shutemov wrote:
>
> On Tue, Aug 13, 2019 at 04:05:53PM +0200, Oleg Nesterov wrote:
> > >
> > > I thought that retract_page_tables() checks vma->anon_vma to ensure that
> > > this vma doesn't have a cow'ed PageAnon() page. And I still can't understand
> > > why can't it race with __handle_mm_fault() paths.
>
> vma->anon_vma check is a cheap way to exclude MAP_PRIVATE mappings that
> got written from userspace.

Yes, and this is how I understood it from the very beginning, but then
I was confused.

> vma->anon_vma can be set up after the check but before taking mmap_sem.
> But page lock would prevent establishing any new ptes of the page, so we
> are safe.

And this is what was not clear to me until I noticed unmap_mapping_pages()
in collapse_shmem().

Plus I was confused by the comment above down_write_trylock(mmap_sem).
To me it looks as if we _could_ do down_write(), but we do not want to
disturb the system.

But iiuc we simply can't do down_write(), exactly because handle_mm_fault()
can wait for page lock with mmap_sem held.

> > > Suppose that shmem_file was mmaped with PROT_READ|WRITE, MAP_PRIVATE.
> > > To simplify, suppose that a non-THP page was already faulted in,
> > > pte_present() == T.
> > >
> > > Userspace writes to this page.
> > >
> > > Why __handle_mm_fault()->handle_pte_fault()->do_wp_page()->wp_page_copy()
> > > can not cow this page and update pte after the vma->anon_vma chech and
> > > before down_write_trylock(mmap_sem) ?
> >
> > OK, probably this is impossible, collapse_shmem() does unmap_mapping_pages(),
> > so handle_pte_fault() will call shmem_fault() which iiuc should block in
> > find_lock_entry() because new_page is locked, and thus down_write_trylock()
> > can't succeed.
>
> You've got it right.

Great, thanks.

> > Nevermind, I am sure I missed something. Perhaps you can update the comments
> > to make this more clear.
>
> Let me see first that my explanation makes sense :P

It does ;)

Oleg.