Re: Fixing MIPS delay slot emulation weakness?

From: Paul Burton
Date: Thu Dec 20 2018 - 12:56:23 EST


Hi Hugh,

On Wed, Dec 19, 2018 at 01:12:58PM -0800, Hugh Dickins wrote:
> > is_cow_mapping() returns true if the VM_MAYWRITE flag is set and
> > VM_SHARED is not set - this suggests a private & potentially-writable
> > area, right? That fits in nicely with an area we'd want to COW. Why then
> > does check_vma_flags() use the inverse of this to indicate a shared
> > area? This fails if we have a private mapping where VM_MAYWRITE is not
> > set, but where FOLL_FORCE would otherwise provide a means of writing to
> > the memory.
> >
> > If I remove this check in check_vma_flags() then I have a nice simple
> > patch which seems to work well, leaving the user mapping of the delay
> > slot emulation page non-writeable. I'm not sure I'm following the mm
> > innards here though - is there something I should change about the delay
> > slot page instead? Should I be marking it shared, even though it isn't
> > really? Or perhaps I'm misunderstanding what VM_MAYWRITE does & I should
> > set that - would that allow a user to use mprotect() to make the region
> > writeable..?
>
> Exactly, in that last sentence above you come to the right understanding
> of VM_MAYWRITE: it allows mprotect to add VM_WRITE whenever. So I think
> your issue in setting up the mmap, is that you're (rightly) doing it with
> VM_flags to mmap_region(), but giving it a combination of flags that an
> mmap() syscall from userspace would never arrive at, so does not match
> expectations in is_cow_mapping(). Look for VM_MAYWRITE in mm/mmap.c:
> you'll find do_mmap() first adding VM_MAYWRITE unconditionally, then
> removing it just from the case of a MAP_SHARED without FMODE_WRITE.
>
> > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> > index 48a9c6b90e07..9476efb54d18 100644
> > --- a/arch/mips/kernel/vdso.c
> > +++ b/arch/mips/kernel/vdso.c
> > @@ -126,8 +126,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> >
> > /* Map delay slot emulation page */
> > base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> > - VM_READ|VM_WRITE|VM_EXEC|
> > - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> > + VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC,
>
> So, remove the VM_WRITE by all means, but leave in the VM_MAYWRITE.

Thanks Hugh - it works fine when I leave in VM_MAYWRITE.

My 4am self had become convinced that it would be problematic if a user
program could mprotect() the memory & make it writable... But in reality
if a user program wants to do that then by all means, the kernel isn't
going to be able to prevent it doing silly things.

For anyone looking for the outcome, the patch I wound up with is here:

https://lore.kernel.org/linux-mips/20181220174514.24953-1-paul.burton@xxxxxxxx/

Thanks,
Paul