Re: [PATCH RFC 06/24] userfaultfd: wp: support write protection for userfault vma range

From: Jerome Glisse
Date: Mon Jan 21 2019 - 09:05:58 EST


On Mon, Jan 21, 2019 at 03:57:04PM +0800, Peter Xu wrote:
> From: Shaohua Li <shli@xxxxxx>
>
> Add API to enable/disable writeprotect a vma range. Unlike mprotect,
> this doesn't split/merge vmas.

AFAICT it does not do that.

>
> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Cc: Pavel Emelyanov <xemul@xxxxxxxxxxxxx>
> Cc: Rik van Riel <riel@xxxxxxxxxx>
> Cc: Kirill A. Shutemov <kirill@xxxxxxxxxxxxx>
> Cc: Mel Gorman <mgorman@xxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Signed-off-by: Shaohua Li <shli@xxxxxx>
> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> ---
> include/linux/userfaultfd_k.h | 2 ++
> mm/userfaultfd.c | 52 +++++++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index 38f748e7186e..e82f3156f4e9 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -37,6 +37,8 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
> unsigned long dst_start,
> unsigned long len,
> bool *mmap_changing);
> +extern int mwriteprotect_range(struct mm_struct *dst_mm,
> + unsigned long start, unsigned long len, bool enable_wp);
>
> /* mm helpers */
> static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 458acda96f20..c38903f501c7 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -615,3 +615,55 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
> {
> return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing);
> }
> +
> +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> + unsigned long len, bool enable_wp)
> +{
> + struct vm_area_struct *dst_vma;
> + pgprot_t newprot;
> + int err;
> +
> + /*
> + * Sanitize the command parameters:
> + */
> + BUG_ON(start & ~PAGE_MASK);
> + BUG_ON(len & ~PAGE_MASK);
> +
> + /* Does the address range wrap, or is the span zero-sized? */
> + BUG_ON(start + len <= start);
> +
> + down_read(&dst_mm->mmap_sem);
> +
> + /*
> + * Make sure the vma is not shared, that the dst range is
> + * both valid and fully within a single existing vma.
> + */
> + err = -EINVAL;
> + dst_vma = find_vma(dst_mm, start);
> + if (!dst_vma || (dst_vma->vm_flags & VM_SHARED))
> + goto out_unlock;
> + if (start < dst_vma->vm_start ||
> + start + len > dst_vma->vm_end)
> + goto out_unlock;
> +
> + if (!dst_vma->vm_userfaultfd_ctx.ctx)
> + goto out_unlock;
> + if (!userfaultfd_wp(dst_vma))
> + goto out_unlock;
> +
> + if (!vma_is_anonymous(dst_vma))
> + goto out_unlock;
> +
> + if (enable_wp)
> + newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE));
> + else
> + newprot = vm_get_page_prot(dst_vma->vm_flags);
> +
> + change_protection(dst_vma, start, start + len, newprot,
> + !enable_wp, 0);

So setting dirty_accountable bring us to that code in mprotect.c:

if (dirty_accountable && pte_dirty(ptent) &&
(pte_soft_dirty(ptent) ||
!(vma->vm_flags & VM_SOFTDIRTY))) {
ptent = pte_mkwrite(ptent);
}

My understanding is that you want to set write flag when enable_wp
is false and you want to set the write flag unconditionaly, right ?

If so then you should really move the change_protection() flags
patch before this patch and add a flag for setting pte write flags.

Otherwise the above is broken at it will only set the write flag
for pte that were dirty and i am guessing so far you always were
lucky because pte were all dirty (change_protection will preserve
dirtyness) when you write protected them.

So i believe the above is broken or at very least unclear if what
you really want is to only set write flag to pte that have the
dirty flag set.


Cheers,
Jérôme


> +
> + err = 0;
> +out_unlock:
> + up_read(&dst_mm->mmap_sem);
> + return err;
> +}
> --
> 2.17.1
>