Re: [PATCH v12 5/6] khugepaged: enable collapse pmd for pte-mapped THP

From: Kirill A. Shutemov
Date: Tue Aug 13 2019 - 11:05:44 EST


On Tue, Aug 13, 2019 at 04:05:53PM +0200, Oleg Nesterov wrote:
> On 08/13, Oleg Nesterov wrote:
> >
> > On 08/12, Kirill A. Shutemov wrote:
> > >
> > > On Mon, Aug 12, 2019 at 03:22:58PM +0200, Oleg Nesterov wrote:
> > > > On 08/12, Kirill A. Shutemov wrote:
> > > > >
> > > > > On Fri, Aug 09, 2019 at 06:01:18PM +0000, Song Liu wrote:
> > > > > > + if (pte_none(*pte) || !pte_present(*pte))
> > > > > > + continue;
> > > > >
> > > > > You don't need to check both. Present is never none.
> > > >
> > > > Agreed.
> > > >
> > > > Kirill, while you are here, shouldn't retract_page_tables() check
> > > > vma->anon_vma (and probably do mm_find_pmd) under vm_mm->mmap_sem?
> > > >
> > > > Can't it race with, say, do_cow_fault?
> > >
> > > vma->anon_vma can race, but it doesn't matter. False-negative is fine.
> > > It's attempt to avoid taking mmap_sem where it can be not productive.
> >
> > I guess I misunderstood the purpose of this check or your answer...
> >
> > Let me reword my question. Why can retract_page_tables() safely do
> > pmdp_collapse_flush(vma) without additional checks similar to what
> > collapse_pte_mapped_thp() does?
> >
> > I thought that retract_page_tables() checks vma->anon_vma to ensure that
> > this vma doesn't have a cow'ed PageAnon() page. And I still can't understand
> > why can't it race with __handle_mm_fault() paths.

vma->anon_vma check is a cheap way to exclude MAP_PRIVATE mappings that
got written from userspace. My thinking was that these VMAs are not worth
investing down_write(mmap_sem) as PMD-mapping is likely to be split later.
(It's totally made up reasoning, I don't have numbers to back it up).

vma->anon_vma can be set up after the check but before taking mmap_sem.
But page lock would prevent establishing any new ptes of the page, so we
are safe.

An alternative would be drop the check, but check that page table is clear
before calling pmdp_collapse_flush() under ptl. It has higher chance to
recover THP for the VMA, but has higher cost too.

I don't know which way is better, so I've chosen which is easier to
implement.

> >
> > Suppose that shmem_file was mmaped with PROT_READ|WRITE, MAP_PRIVATE.
> > To simplify, suppose that a non-THP page was already faulted in,
> > pte_present() == T.
> >
> > Userspace writes to this page.
> >
> > Why __handle_mm_fault()->handle_pte_fault()->do_wp_page()->wp_page_copy()
> > can not cow this page and update pte after the vma->anon_vma chech and
> > before down_write_trylock(mmap_sem) ?
>
> OK, probably this is impossible, collapse_shmem() does unmap_mapping_pages(),
> so handle_pte_fault() will call shmem_fault() which iiuc should block in
> find_lock_entry() because new_page is locked, and thus down_write_trylock()
> can't succeed.

You've got it right.

> Nevermind, I am sure I missed something. Perhaps you can update the comments
> to make this more clear.

Let me see first that my explanation makes sense :P

--
Kirill A. Shutemov