Re: [RFC PATCH 3/6] mm, arm64: untag user addresses in memory syscalls

From: Catalin Marinas
Date: Wed Mar 14 2018 - 13:44:54 EST


On Wed, Mar 14, 2018 at 04:45:20PM +0100, Andrey Konovalov wrote:
> On Fri, Mar 9, 2018 at 6:42 PM, Evgenii Stepanov <eugenis@xxxxxxxxxx> wrote:
> > On Fri, Mar 9, 2018 at 9:31 AM, Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote:
> >> On Fri, Mar 9, 2018 at 4:53 PM, Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> >>> I'm not yet convinced these functions need to allow tagged pointers.
> >>> They are not doing memory accesses but rather dealing with the memory
> >>> range, hence an untagged pointer is better suited. There is probably a
> >>> reason why the "start" argument is "unsigned long" vs "void __user *"
> >>> (in the kernel, not the man page).
> >>
> >> So that would make the user to untag pointers before passing to these syscalls.
> >>
> >> Evgeniy, would that be possible to untag pointers in HWASan before
> >> using memory subsystem syscalls? Is there a reason for untagging them
> >> in the kernel?
> >
> > Generally, no. It's possible to intercept a libc call using symbol
> > interposition, but I don't know how to rewrite arguments of a raw
> > system call other than through ptrace, and that creates more problems
> > than it solves.

With these patches, we are trying to relax the user/kernel ABI so that
tagged pointers can be passed into the kernel. Since this is a new ABI
(or an extension to the existing one), it might be ok to change the libc
so that the top byte is zeroed on specific syscalls before issuing the
SVC.

I agree that it is problematic for HWASan if it only relies on
overriding malloc/free.

> > AFAIU, it's valid for a program to pass an address obtained from
> > malloc or, better, posix_memalign to an mm syscall like mprotect().
> > These arguments are pointers from the userspace point of view.
>
> Catalin, do you think this is a good reason to have the untagging done
> in the kernel?

malloc() or posix_memalign() are C library implementations and it's the
C library (or overridden functions) setting a tag on the returned
pointers. Since the TBI hardware feature allows memory accesses with a
non-zero tag, we could allow them in the kernel for syscalls performing
such accesses on behalf of the user (e.g. get_user/put_user would not
need to clear the tag).

madvise(), OTOH, does not perform a memory access on behalf of the user,
it's just advising the kernel about a range of virtual addresses. That's
where I think, from an ABI perspective, it doesn't make much sense to
allow tags into the kernel for these syscalls (even if it's simpler from
a user space perspective).

(but I don't have a very strong opinion on this ;))

--
Catalin