Re: [PATCHv9-rebased2 01/37] mm, thp: make swapin readahead under down_read of mmap_sem

From: Kirill A. Shutemov
Date: Thu Jun 16 2016 - 06:09:04 EST


On Thu, Jun 16, 2016 at 02:52:52PM +0800, Hillf Danton wrote:
> >
> > From: Ebru Akagunduz <ebru.akagunduz@xxxxxxxxx>
> >
> > Currently khugepaged makes swapin readahead under down_write. This patch
> > supplies to make swapin readahead under down_read instead of down_write.
> >
> > The patch was tested with a test program that allocates 800MB of memory,
> > writes to it, and then sleeps. The system was forced to swap out all.
> > Afterwards, the test program touches the area by writing, it skips a page
> > in each 20 pages of the area.
> >
> > Link: http://lkml.kernel.org/r/1464335964-6510-4-git-send-email-ebru.akagunduz@xxxxxxxxx
> > Signed-off-by: Ebru Akagunduz <ebru.akagunduz@xxxxxxxxx>
> > Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> > Cc: Rik van Riel <riel@xxxxxxxxxx>
> > Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
> > Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
> > Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
> > Cc: Mel Gorman <mgorman@xxxxxxx>
> > Cc: David Rientjes <rientjes@xxxxxxxxxx>
> > Cc: Vlastimil Babka <vbabka@xxxxxxx>
> > Cc: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> > Cc: Michal Hocko <mhocko@xxxxxxx>
> > Cc: Minchan Kim <minchan.kim@xxxxxxxxx>
> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > ---
> > mm/huge_memory.c | 92 ++++++++++++++++++++++++++++++++++++++------------------
> > 1 file changed, 63 insertions(+), 29 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index f2bc57c45d2f..96dfe3f09bf6 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2378,6 +2378,35 @@ static bool hugepage_vma_check(struct vm_area_struct *vma)
> > }
> >
> > /*
> > + * If mmap_sem temporarily dropped, revalidate vma
> > + * before taking mmap_sem.
>
> See below

> > @@ -2401,11 +2430,18 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
> > continue;
> > swapped_in++;
> > ret = do_swap_page(mm, vma, _address, pte, pmd,
> > - FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT,
> > + FAULT_FLAG_ALLOW_RETRY,
>
> Add a description in change log for it please.

Ebru, would you address it?

> > pteval);
> > + /* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
> > + if (ret & VM_FAULT_RETRY) {
> > + down_read(&mm->mmap_sem);
> > + /* vma is no longer available, don't continue to swapin */
> > + if (hugepage_vma_revalidate(mm, vma, address))
> > + return false;
>
> Revalidate vma _after_ acquiring mmap_sem, but the above comment says _before_.

Ditto.

> > + if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) {
> > + up_read(&mm->mmap_sem);
> > + goto out;
>
> Jump out with mmap_sem released,
>
> > + result = hugepage_vma_revalidate(mm, vma, address);
> > + if (result)
> > + goto out;
>
> but jump out again with mmap_sem held.
>
> They are cleaned up in subsequent darns?

I didn't fold fixups for these
>

--
Kirill A. Shutemov