Re: [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup

From: Will Deacon
Date: Tue Jan 05 2021 - 10:09:45 EST


On Fri, Dec 25, 2020 at 01:25:29AM -0800, Nadav Amit wrote:
> From: Nadav Amit <namit@xxxxxxxxxx>
>
> Clearing soft-dirty through /proc/[pid]/clear_refs can cause memory
> corruption as it clears the dirty-bit without acquiring the mmap_lock
> for write and defers TLB flushes.
>
> As a result of this behavior, it is possible that one of the CPUs would
> have the stale PTE cached in its TLB and keep updating the page while
> another thread triggers a page-fault, and the page-fault handler would
> copy the old page into a new one.

[...]

> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 217aa2705d5d..39b2bd27af79 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1189,6 +1189,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> struct mm_struct *mm;
> struct vm_area_struct *vma;
> enum clear_refs_types type;
> + bool write_lock = false;
> struct mmu_gather tlb;
> int itype;
> int rv;
> @@ -1236,21 +1237,16 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> }
> tlb_gather_mmu(&tlb, mm, 0, -1);
> if (type == CLEAR_REFS_SOFT_DIRTY) {
> + mmap_read_unlock(mm);
> + if (mmap_write_lock_killable(mm)) {
> + count = -EINTR;
> + goto out_mm;
> + }
> for (vma = mm->mmap; vma; vma = vma->vm_next) {
> - if (!(vma->vm_flags & VM_SOFTDIRTY))
> - continue;
> - mmap_read_unlock(mm);
> - if (mmap_write_lock_killable(mm)) {
> - count = -EINTR;
> - goto out_mm;
> - }
> - for (vma = mm->mmap; vma; vma = vma->vm_next) {
> - vma->vm_flags &= ~VM_SOFTDIRTY;
> - vma_set_page_prot(vma);
> - }
> - mmap_write_downgrade(mm);
> - break;
> + vma->vm_flags &= ~VM_SOFTDIRTY;
> + vma_set_page_prot(vma);
> }
> + write_lock = true;
>
> mmu_notifier_range_init(&range, MMU_NOTIFY_SOFT_DIRTY,
> 0, NULL, mm, 0, -1UL);
> @@ -1261,7 +1257,10 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> if (type == CLEAR_REFS_SOFT_DIRTY)
> mmu_notifier_invalidate_range_end(&range);
> tlb_finish_mmu(&tlb, 0, -1);
> - mmap_read_unlock(mm);
> + if (write_lock)
> + mmap_write_unlock(mm);
> + else
> + mmap_read_unlock(mm);
> out_mm:
> mmput(mm);

I probably wouldn't bother with the 'write_lock' variable, and just check
'type == CLEAR_REFS_SOFT_DIRTY' instead.

But that's trivial and I don't have strong opinions, so:

Acked-by: Will Deacon <will@xxxxxxxxxx>

Are you intending to land this for 5.11? If so, I can just rebase my other
series on top of this.

Will