Re: [PATCH v3 tip/core/rcu 3/9] rcu: Add synchronous grace-period waiting for RCU-tasks

From: Peter Zijlstra
Date: Fri Aug 08 2014 - 02:40:40 EST


On Thu, Aug 07, 2014 at 05:18:23PM -0400, Steven Rostedt wrote:
> On Thu, 7 Aug 2014 22:08:13 +0200
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > OK, you've got to start over and start at the beginning, because I'm
> > really not understanding this..
> >
> > What is a 'trampoline' and what are you going to use them for.
>
> Great question! :-)
>
> The trampoline is some code that is used to jump to and then jump
> someplace else. Currently, we use this for kprobes and ftrace. For
> ftrace we have the ftrace_caller trampoline, which is static. When
> booting, most functions in the kernel call the mcount code which
> simply returns without doing anything. This too is a "trampoline". At
> boot, we convert these calls to nops (as you already know). When we
> enable callbacks from functions, we convert those calls to call
> "ftrace_caller" which is a small assembly trampoline that will call
> some function that registered with ftrace.
>
> Now why do we need the call_rcu_task() routine?
>
> Right now, if you register multiple callbacks to ftrace, even if they
> are not tracing the same routine, ftrace has to change ftrace_caller to
> call another trampoline (in C), that does a loop of all ops registered
> with ftrace, and compares the function to the ops hash tables to see if
> the ops function should be called for that function.
>
> What we want to do is to create a dynamic trampoline that is a copy of
> the ftrace_caller code, but instead of calling this list trampoline, it
> calls the ops function directly. This way, each ops registered with
> ftrace can have its own custom trampoline that when called will only
> call the ops function and not have to iterate over a list. This only
> happens if the function being traced only has this one ops registered.
> For functions with multiple ops attached to it, we need to call the
> list anyway. But for the majority of the cases, this is not the case.
>
> The one caveat for this is, how do we free this custom trampoline when
> the ops is done with it? Especially for users of ftrace that
> dynamically create their own ops (like perf, and ftrace instances).
>
> We need to find a way to free it, but unfortunately, there's no way to
> know when it is safe to free it. There's no way to disable preemption
> or have some other notifier to let us know if a task has jumped to this
> trampoline and has been preempted (sleeping). The only safe way to know
> that no task is on the trampoline is to remove the calls to it,
> synchronize the CPUS (so the trampolines are not even in the caches),
> and then wait for all tasks to go through some quiescent state. This
> state happens to be either not running, in userspace, or when it
> voluntarily calls schedule. Because nothing that uses this trampoline
> should do that, and if the task voluntarily calls schedule, we know
> it's not on the trampoline.
>
> Make sense?

Ok, so they're purely used in the function prologue/epilogue callchain.
And you don't want to use synchronize_tasks() because registering a trace
functions is atomic ?

But why would you use dynamic memory allocation for these trampolines at
all? Why not use the one default trampoline for this?

Suppose that thing looks like:

ftrace_mcount_handler()
{
for_each_hlist_rcu(entry,..)
entry->func();
}

so why not make it look like:

ftrace_mcount_handler()
{
asm_volatile_goto("jmp %l[label]" ::: &do_list);
return;

do_list:
for_each_hlist_rcu(entry,...)
entry->func();
}

Then, for:
no entries -> NOP,
one entry -> "CALL $func",
more entries -> "JMP &do_list.

No need for extra allocations and fancy means of getting rid of them,
and only a few bytes extra wrt the existing function.

Attachment: pgpLxYyCIbhMf.pgp
Description: PGP signature