Re: [RFC PATCH] x86/syscalls: allow tracing of __do_sys_[syscall] functions

From: Peter Zijlstra
Date: Tue Sep 20 2022 - 05:47:35 EST


On Tue, Sep 13, 2022 at 06:52:13AM -0700, Nadav Amit wrote:
> From: Nadav Amit <namit@xxxxxxxxxx>
>
> Tracing - through ftrace function tracer and kprobes - of certain common
> syscall functions is currently disabled. Setting kprobes on these
> functions is specifically useful for debugging of syscall failures.
>
> Such tracing is disabled since __do_sys_[syscall] functions are declared
> as "inline". "inline" in the kernel is actually defined as a macro that
> in addition to using the inline keyword also disables tracing (notrace).
> According to the comments in the code, tracing inline functions can
> wreck havoc, which is probably true in some cases.
>
> In practice, however, this might be too extensive. The compiler regards
> the "inline" keyword only as a hint, which it is free to ignore. In
> fact, in my builds gcc ignores the "inline" hint for many
> __do_sys_[syscall] since some of these functions are quite big and
> called from multiple locations (for compat). As a result, these
> functions cannot be traced.
>
> There are 3 possible solutions for enabling the tracing of
> __do_sys_[syscall]:
>
> 1. Mark __do_sys_[syscall] as __always_inline instead of inline. This
> would increase the executable size, which might not be desired.
>
> 2. Remove the inline hint from __do_sys_[syscall]. Again, it might
> affect the generated code, inducing function call overhead for some
> syscalls.
>
> 3. Remove "notrace" from the "inline" macro definition, and require
> functions that cannot be traced to be marked explicitly as "notrace".
> This might be the most correct solution, which would also enable tracing
> of additional useful functions. But finding the functions that cannot
> be traced is not easy without some automation.
>
> 4. Avoid the use of "notrace" specifically for __do_sys_[syscall].
>
> Use the last approach to enable the tracing of __do_sys_[syscall]
> functions. Introduce an "inline_trace" macro that sets the "__inline"
> keyword without "notrace". Use it for the syscall wrappers.
>
> This enables the tracing of 54 useful functions on my build, for
> instance, __do_sys_vmsplice(), __do_sys_mremap() and
> __do_sys_process_madvise().
>
> Cc: "Peter Zijlstra (Intel)" <peterz@xxxxxxxxxxxxx>
> Cc: "Steven Rostedt (Google)" <rostedt@xxxxxxxxxxx>
> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>

So at least for x86 these functions cannot be inlined, at all times the
syscalls are laundered through the syscall table.

It is very hard to take the address of an inline function and stuff it
in a table.

Additionally, all indirect syscall table calls are in instrumentable
code, so tracing should not be an issue -- again speaking for x86.

For the above suggestions:

#1 above should refuse to build IMO, one shouldn't be allowed to take
the address of an __always_inline function.

#2 purely x86 speaking -- I don't see an issue with just taking the
'inline' keyword away entirely.

#3 I think Steve's concern is that the tracability of a function then
depends on the compiler's whim -- but yeah, who cares ;-)

#4 not a fan, but I also don't see anything wrong with it -- from x86
pov.


IOW, please figure out why these things are inline to begin with; this
might require auditing all architecture syscall code. While doing that
audit, make sure to determine if all of them can handle tracing at these
points.