Re: [PATCH v6 2/2] arm64: implement support for static call trampolines

From: Quentin Perret
Date: Wed Nov 10 2021 - 07:06:15 EST


Hi Mark,

On Wednesday 10 Nov 2021 at 11:09:40 (+0000), Mark Rutland wrote:
> Hi,
>
> On Tue, Nov 09, 2021 at 07:02:21PM +0000, Quentin Perret wrote:
> > On Tuesday 09 Nov 2021 at 19:09:21 (+0100), Ard Biesheuvel wrote:
> > > Android relies heavily on tracepoints for vendor hooks, and given the
> > > performance impact of CFI on indirect calls, there has been interest
> > > in enabling static calls to replace them.
>
> Hhmm.... what exactly is a "vendor hook" in this context, and what is it doing
> with a tracepoint? From an upstream perspective that sounds somewhat fishy
> usage.

Right, 'vendor hooks' are an ugly Android-specific hack that I hope we
will be able to get rid off overtime. And I don't think upstream should
care about any of this TBH. But it's not the only use-case in Android
for having modules attached to tracepoints, and the other is a bit more
relevant to upstream. So I'd day it makes sense to have that discussion
here.

Specifically, we've got a bunch of 'empty' tracepoints *upstream* in e.g.
the scheduler that don't have any trace events associated with them (see
the exported TPs at the top of kernel/sched/core.c for instance).
They're exported with no in-kernel user on purpose. The only reason they
exist is to allow people to attach modules to them, and do whatever they
need from there (collect stats, write to the trace buffer, ...). That
way the kernel doesn't commit to any userspace ABI, and the maintenance
burden falls on whoever maintains the module instead.

But nowadays virtually every vendor/OEM in the Android world attaches to
those TPs, in production, to gather stats and whatnot. And given that
some of them are hooked in scheduler hot paths, we'd really like those
to be low overhead. I wouldn't be surprised if other distros get the
same issues at some point FWIW -- they all collect SCHED_DEBUG stats and
such.

>
> > > Quentin, anything to add here?
> >
> > Yes, Android should definitely benefit from static calls.
> >
> > Modules attaching to tracepoints cause a measurable overhead w/ CFI as
> > the jump target is a bit harder to verify if it is not in-kernel.
>
> Where does that additional overhead come from when the target is not in-kernel?
>
> I hope that I am wrong in understanding that __cfi_slowpath_diag() means we're
> always doing an out-of-line check when calling into a module?

Nope, I think you're right.

> If that were the case, that would seem to be a much more general problem with
> the current clang CFI scheme, and my fear here is that we're adding fragility
> and complexity in specific plces to work around general problems with the CFI
> scheme.

Right, no objection from me if we want to optimize the CFI slowpath
instead if we can find a way to do that.

A few thoughts:

- attaching and detaching to TPs is a very infrequent operation, so
having to do CFI checks (however cheap) before every call is a bit
sad as the target doesn't change;

- so far the CFI overhead has been visible in practice mainly for
tracepoints and not really anywhere else. The cost of a
kernel-to-module indirect calls for e.g. driver operations seems to
often (though not always) be somewhat small compared to the work
done by the driver itself. And I think that module-to-kernel calls
should be mostly unaffected as we will either resolve them with
PC-relative instructions if within range, or via the module PLT
which doesn't include CFI checks IIRC.

Thanks,
Quentin