Re: [RFC PATCH 1/4] uprobes: use set_pte_at() not set_pte_at_notify()

From: Jerome Glisse
Date: Mon Feb 11 2019 - 14:28:09 EST


Background we are discussing __replace_page() in:
kernel/events/uprobes.c

and wether this can be call on page that can be written too through
its virtual address mapping.

On Fri, Feb 01, 2019 at 07:50:22PM -0500, Andrea Arcangeli wrote:
> On Thu, Jan 31, 2019 at 01:37:03PM -0500, Jerome Glisse wrote:
> > @@ -207,8 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
> >
> > flush_cache_page(vma, addr, pte_pfn(*pvmw.pte));
> > ptep_clear_flush_notify(vma, addr, pvmw.pte);
> > - set_pte_at_notify(mm, addr, pvmw.pte,
> > - mk_pte(new_page, vma->vm_page_prot));
> > + set_pte_at(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot));
> >
> > page_remove_rmap(old_page, false);
> > if (!page_mapped(old_page))
>
> This seems racy by design in the way it copies the page, if the vma
> mapping isn't readonly to begin with (in which case it'd be ok to
> change the pfn with change_pte too, it'd be a from read-only to
> read-only change which is ok).
>
> If the code copies a writable page there's no much issue if coherency
> is lost by other means too.

I am not sure the race exist but i am not familiar with the uprobe
code so maybe the page is already write protected and thus the copy
is fine and in fact that is likely the case but there is not check
for that. Maybe there should be a check ?

Maybe someone familiar with this code can chime in.

>
> Said that this isn't a worthwhile optimization for uprobes so because
> of the lack of explicit read-only enforcement, I agree it's simpler to
> skip change_pte above.
>
> It's orthogonal, but in this function the
> mmu_notifier_invalidate_range_end(&range); can be optimized to
> mmu_notifier_invalidate_range_only_end(&range); otherwise there's no
> point to retain the _notify in ptep_clear_flush_notify.

We need to keep the _notify for IOMMU otherwise it would break IOMMU.
IOMMU can walk the page table at any time and thus we need to first
clear the table then notify the IOMMU to flush TLB on all the devices
that might have a TLB entry. Only then can we set the new pte.

But yes the mmu_notifier_invalidate_range_end can be optimized to
only end. I will do a separate patch for this. As it is orthogonal as
you pointed out :)

Cheers,
Jérôme