Re: [PATCHv3 03/12] mm: fix handling PTE-mapped THPs in page_referenced()
From: Kirill A. Shutemov
Date: Sat Feb 04 2017 - 05:34:08 EST
On Thu, Feb 02, 2017 at 04:26:56PM +0100, Michal Hocko wrote:
> On Sun 29-01-17 20:38:49, Kirill A. Shutemov wrote:
> > For PTE-mapped THP page_check_address_transhuge() is not adequate: it
> > cannot find all relevant PTEs, only the first one. It means we can miss
> > some references of the page and it can result in suboptimal decisions by
> > vmscan.
> >
> > Let's switch it to page_vma_mapped_walk().
> >
> > I don't think it's subject for stable@: it's not fatal. The only side
> > effect is that THP can be swapped out when it shouldn't.
>
> Please be more specific about the situation when this happens and how a
> user can recognize this is going on. In other words when should I
> consider backporting this series.
The first you need huge PMD to get split with split_huge_pmd(). It can
happen due to munmap(), mprotect(), mremap(), etc. After split_huge_pmd()
we have THP mapped with bunch of PTEs instead of single PMD.
The bug is that the kernel only sees pte_young() on the PTEs that maps the
first 4k, but not the rest. So if your access pattern touches the THP, but
not the first 4k, the page can be reclaimed unfairly and possibly
re-faulted from swap soon after.
I don't think it's visible to user, except as unneeded swap-out/swap-in in
on rare occasion.
> Also the interface is quite awkward imho. Why cannot we provide a
> callback into page_vma_mapped_walk and call it for each pte/pmd that
> matters to the given page? Wouldn't that be much easier than the loop
> around page_vma_mapped_walk iterator?
I don't agree that interface with call back would be easier. You would
also need to pass down additional context with packing/unpacking it on
both ends. I don't think it makes interface less awkward.
But it's matter of taste.
--
Kirill A. Shutemov