Re: [PATCHv2 1/3] uprobe: Add uretprobe syscall to speed up return probe

From: Google
Date: Thu Apr 04 2024 - 12:06:28 EST


On Thu, 4 Apr 2024 13:58:43 +0200
Jiri Olsa <olsajiri@xxxxxxxxx> wrote:

> On Wed, Apr 03, 2024 at 07:00:07PM -0700, Andrii Nakryiko wrote:
>
> SNIP
>
> > Check rt_sigreturn syscall (manpage at [0], for example).
> >
> > sigreturn() exists only to allow the implementation of signal
> > handlers. It should never be called directly. (Indeed, a simple
> > sigreturn() wrapper in the GNU C library simply returns -1, with
> > errno set to ENOSYS.) Details of the arguments (if any) passed
> > to sigreturn() vary depending on the architecture. (On some
> > architectures, such as x86-64, sigreturn() takes no arguments,
> > since all of the information that it requires is available in the
> > stack frame that was previously created by the kernel on the
> > user-space stack.)
> >
> > This is a very similar use case. Also, check its source code in
> > arch/x86/kernel/signal_64.c. It sends SIGSEGV to the calling process
> > on any sign of something not being right. It's exactly the same with
> > sys_uretprobe.
> >
> > [0] https://man7.org/linux/man-pages/man2/sigreturn.2.html
> >
> > > And the number of syscalls are limited resource.
> >
> > We have almost 500 of them, it didn't seems like adding 1-2 for good
> > reasons would be a problem. Can you please point to where the limits
> > on syscalls as a resource are described? I'm curious to learn.
> >
> > >
> > > I'm actually not sure how much we need to care of it, but adding a new
> > > syscall is worth to be discussed carefully because all of them are
> > > user-space compatibility.
> >
> > Absolutely, it's a good discussion to have.
> >
> > >
> > > > > > > Also, we should run syzkaller on this syscall. And if uretprobe is
> > > > > >
> > > > > > right, I'll check on syzkaller
> > > > > >
> > > > > > > set in the user function, what happen if the user function directly
> > > > > > > calls this syscall? (maybe it consumes shadow stack?)
> > > > > >
> > > > > > the process should receive SIGILL if there's no pending uretprobe for
> > > > > > the current task, or it will trigger uretprobe if there's one pending
> > > > >
> > > > > No, that is too aggressive and not safe. Since the syscall is exposed to
> > > > > user program, it should return appropriate error code instead of SIGILL.
> > > > >
> > > >
> > > > This is the way it is today with uretprobes even through interrupt.
> > >
> > > I doubt that the interrupt (exception) and syscall should be handled
> > > differently. Especially, this exception is injected by uprobes but
> > > syscall will be caused by itself. But syscall can be called from user
> > > program (of couse this works as sys_kill(self, SIGILL)).
> >
> > Yep, I'd keep the behavior the same between uretprobes implemented
> > through int3 and sys_uretprobe.
>
> +1
>
> >
> > >
> > > > E.g., it could happen that user process is using fibers and is
> > > > replacing stack pointer without kernel realizing this, which will
> > > > trigger some defensive checks in uretprobe handling code and kernel
> > > > will send SIGILL because it can't support such cases. This is
> > > > happening today already, and it works fine in practice (except for
> > > > applications that manually change stack pointer, too bad, you can't
> > > > trace them with uretprobes, unfortunately).
> > >
> > > OK, we at least need to document it.
> >
> > +1, yep
> >
> > >
> > > >
> > > > So I think it's absolutely adequate to have this behavior if the user
> > > > process is *intentionally* abusing this API.
> > >
> > > Of course user expected that it is abusing. So at least we need to
> > > add a document that this syscall number is reserved to uprobes and
> > > user program must not use it.
> > >
> >
> > Totally agree about documenting this.
>
> ok there's map page on sigreturn.. do you think we should add man page
> for uretprobe or you can think of some other place to document it?

I think it is better to have a man-page. Anyway, to discuss and explain
this syscall, the man-page is a good format to describe it.

>
> >
> > > >
> > > > > >
> > > > > > but we could limit the syscall to be executed just from the trampoline,
> > > > > > that should prevent all the user space use cases, I'll do that in next
> > > > > > version and add more tests for that
> > > > >
> > > > > Why not limit? :) The uprobe_handle_trampoline() expects it is called
> > > > > only from the trampoline, so it is natural to check the caller address.
> > > > > (and uprobe should know where is the trampoline)
> > > > >
> > > > > Since the syscall is always exposed to the user program, it should
> > > > > - Do nothing and return an error unless it is properly called.
> > > > > - check the prerequisites for operation strictly.
> > > > > I concern that new system calls introduce vulnerabilities.
> > > > >
> > > >
> > > > As Oleg and Jiri mentioned, this syscall can't harm kernel or other
> > > > processes, only the process that is abusing the API. So any extra
> > > > checks that would slow down this approach is an unnecessary overhead
> > > > and complication that will never be useful in practice.
> > >
> > > I think at least it should check the caller address to ensure the
> > > address is in the trampoline.
> > > But anyway, uprobes itself can break the target process, so no one
> > > might care if this system call breaks the process now.
> >
> > If we already have an expected range of addresses, then I think it's
> > fine to do a quick unlikely() check. I'd be more concerned if we need
> > to do another lookup or search to just validate this. I'm sure Jiri
> > will figure it out.
>
> Oleg mentioned the trampoline address check could race with another
> thread's mremap call, however trap is using that check as well, it
> still seems like good idea to do it also in the uretprobe syscall

Yeah, and also, can we add a stack pointer check if the trampoline is
shared with other probe points?

Thank you,

>
> jirka


--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>