Re: [PATCH V6 3/6] mm: Introduce VM_LOCKONFAULT
From: Kirill A. Shutemov
Date: Thu Aug 06 2015 - 11:53:40 EST
On Thu, Aug 06, 2015 at 05:33:54PM +0200, Vlastimil Babka wrote:
> On 07/29/2015 05:42 PM, Eric B Munson wrote:
> >The cost of faulting in all memory to be locked can be very high when
> >working with large mappings. If only portions of the mapping will be
> >used this can incur a high penalty for locking.
> >
> >For the example of a large file, this is the usage pattern for a large
> >statical language model (probably applies to other statical or graphical
> >models as well). For the security example, any application transacting
> >in data that cannot be swapped out (credit card data, medical records,
> >etc).
> >
> >This patch introduces the ability to request that pages are not
> >pre-faulted, but are placed on the unevictable LRU when they are finally
> >faulted in. The VM_LOCKONFAULT flag will be used together with
> >VM_LOCKED and has no effect when set without VM_LOCKED. Setting the
> >VM_LOCKONFAULT flag for a VMA will cause pages faulted into that VMA to
> >be added to the unevictable LRU when they are faulted or if they are
> >already present, but will not cause any missing pages to be faulted in.
> >
> >Exposing this new lock state means that we cannot overload the meaning
> >of the FOLL_POPULATE flag any longer. Prior to this patch it was used
> >to mean that the VMA for a fault was locked. This means we need the
> >new FOLL_MLOCK flag to communicate the locked state of a VMA.
> >FOLL_POPULATE will now only control if the VMA should be populated and
> >in the case of VM_LOCKONFAULT, it will not be set.
> >
> >Signed-off-by: Eric B Munson <emunson@xxxxxxxxxx>
> >Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> >Cc: Michal Hocko <mhocko@xxxxxxx>
> >Cc: Vlastimil Babka <vbabka@xxxxxxx>
> >Cc: Jonathan Corbet <corbet@xxxxxxx>
> >Cc: "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx>
> >Cc: linux-kernel@xxxxxxxxxxxxxxx
> >Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx
> >Cc: linux-mm@xxxxxxxxx
> >Cc: linux-api@xxxxxxxxxxxxxxx
> >---
> > drivers/gpu/drm/drm_vm.c | 8 +++++++-
> > fs/proc/task_mmu.c | 1 +
> > include/linux/mm.h | 2 ++
> > kernel/fork.c | 2 +-
> > mm/debug.c | 1 +
> > mm/gup.c | 10 ++++++++--
> > mm/huge_memory.c | 2 +-
> > mm/hugetlb.c | 4 ++--
> > mm/mlock.c | 2 +-
> > mm/mmap.c | 2 +-
> > mm/rmap.c | 4 ++--
> > 11 files changed, 27 insertions(+), 11 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
> >index aab49ee..103a5f6 100644
> >--- a/drivers/gpu/drm/drm_vm.c
> >+++ b/drivers/gpu/drm/drm_vm.c
> >@@ -699,9 +699,15 @@ int drm_vma_info(struct seq_file *m, void *data)
> > (void *)(unsigned long)virt_to_phys(high_memory));
> >
> > list_for_each_entry(pt, &dev->vmalist, head) {
> >+ char lock_flag = '-';
> >+
> > vma = pt->vma;
> > if (!vma)
> > continue;
> >+ if (vma->vm_flags & VM_LOCKONFAULT)
> >+ lock_flag = 'f';
> >+ else if (vma->vm_flags & VM_LOCKED)
> >+ lock_flag = 'l';
> > seq_printf(m,
> > "\n%5d 0x%pK-0x%pK %c%c%c%c%c%c 0x%08lx000",
> > pt->pid,
> >@@ -710,7 +716,7 @@ int drm_vma_info(struct seq_file *m, void *data)
> > vma->vm_flags & VM_WRITE ? 'w' : '-',
> > vma->vm_flags & VM_EXEC ? 'x' : '-',
> > vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
> >- vma->vm_flags & VM_LOCKED ? 'l' : '-',
> >+ lock_flag,
> > vma->vm_flags & VM_IO ? 'i' : '-',
> > vma->vm_pgoff);
> >
> >diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> >index ca1e091..38d69fc 100644
> >--- a/fs/proc/task_mmu.c
> >+++ b/fs/proc/task_mmu.c
> >@@ -579,6 +579,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>
> This function has the following comment:
>
> Don't forget to update Documentation/ on changes.
>
> [...]
>
> >--- a/mm/gup.c
> >+++ b/mm/gup.c
> >@@ -92,7 +92,7 @@ retry:
> > */
> > mark_page_accessed(page);
> > }
> >- if ((flags & FOLL_POPULATE) && (vma->vm_flags & VM_LOCKED)) {
> >+ if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
> > /*
> > * The preliminary mapping check is mainly to avoid the
> > * pointless overhead of lock_page on the ZERO_PAGE
> >@@ -265,6 +265,9 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
> > unsigned int fault_flags = 0;
> > int ret;
> >
> >+ /* mlock all present pages, but do not fault in new pages */
> >+ if ((*flags & (FOLL_POPULATE | FOLL_MLOCK)) == FOLL_MLOCK)
> >+ return -ENOENT;
> > /* For mm_populate(), just skip the stack guard page. */
> > if ((*flags & FOLL_POPULATE) &&
> > (stack_guard_page_start(vma, address) ||
> >@@ -850,7 +853,10 @@ long populate_vma_page_range(struct vm_area_struct *vma,
> > VM_BUG_ON_VMA(end > vma->vm_end, vma);
> > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
> >
> >- gup_flags = FOLL_TOUCH | FOLL_POPULATE;
> >+ gup_flags = FOLL_TOUCH | FOLL_MLOCK;
> >+ if ((vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT)) == VM_LOCKED)
> >+ gup_flags |= FOLL_POPULATE;
> >+
> > /*
> > * We want to touch writable mappings with a write fault in order
> > * to break COW, except for shared mappings because these don't COW
>
> I think this might be breaking the populate part of mmap(MAP_POPULATE &
> ~MAP_LOCKED) case, if I follow the execution correctly (it's far from
> simple...)
>
> SYSCALL_DEFINE6(mmap_pgoff... with MAP_POPULATE
> vm_mmap_pgoff(..., MAP_POPULATE...)
> do_mmap_pgoff(...MAP_POPULATE... &populate) -> populate == TRUE
> mm_populate()
> __mm_populate()
> populate_vma_page_range()
>
> Previously, this path would have FOLL_POPULATE in gup_flags and continue
> with __get_user_pages() and faultin_page() (actually regardless of
> FOLL_POPULATE) which would fault in the pages.
>
> After your patch, populate_vma_page_range() will set FOLL_MLOCK, but since
> VM_LOCKED is not set, FOLL_POPULATE won't be set either.
> Then faultin_page() will return on the new check:
>
> flags & (FOLL_POPULATE | FOLL_MLOCK)) == FOLL_MLOCK
Good catch!
I guess it should be something like:
gup_flags = FOLL_TOUCH | FOLL_POPULATE | FOLL_MLOCK;
if (vma->vm_flags & VM_LOCKONFAULT)
gup_flags &= ~FOLL_POPULATE;
--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/