Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

From: Ard Biesheuvel
Date: Mon Oct 08 2018 - 07:30:33 EST


On 6 October 2018 at 15:39, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Sat, 6 Oct 2018 14:12:11 +0200
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>> On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
>> > +#define arch_dynfunc_trampoline(name, def) \
>> > + asm volatile ( \
>> > + ".globl dynfunc_" #name "; \n\t" \
>> > + "dynfunc_" #name ": \n\t" \
>> > + "jmp " #def " \n\t" \
>> > + ".balign 8 \n \t" \
>> > + : : : "memory" )
>>
>> Bah, what is it with you people and trampolines. Why can't we, just like
>> jump_label, patch the call directly?
>>
>> The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
>> is slower for no real reason afaict.
>
> My first attempt was to do just that. But to add a label at the
> call site required handling all the parameters too. See my branch:
> ftrace/jump_function-v1 for how ugly it got (and it didn't work).
>
>>
>> Steve, also see:
>>
>> https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheuvel@xxxxxxxxxx
>
> Interesting. I don't have time to look at it at the moment to see what
> was done, but will do so in the near future.
>
> Remember, this was a proof of concept and even with the trampolines, it
> showed a great level of improvement. One thought was to do a
> "recordmcount.c" type of action to find where the calls were and patch
> them directly at boot up. I tried to keep the API the same where this
> could actually be done as an improvement later.
>
> Perhaps a gcc plugin might work too.
>
> I'll have to see what Ard did to handle the function parameters.
>

I didn't. My approach is just a generic function pointer which can be
overridden by the arch to be emitted as a trampoline instead.

Patching the call directly simply isn't feasible without compiler
support like we have with asm goto for jump_labels.

The use case I am proposing is providing different implementations of
crypto routines or CRC computation etc without having to rely on
function pointers, but still keep them as loadable modules. These
routines are a) heavy weight or we wouldn't bother providing
alternatives in the first place, and b) likely to have a considerable
I$ footprint already (and crypto libraries that have
encrypt/decrypt/setkey or init/update/final entry points will end up
with multiple trampolines in a single I-cache line). Also, the actual
patching only occurs on module load/unload.

I have no idea whether this reasoning applies to Steven's use case as
well, though, I haven't looked at his patches (I wasn't on cc)