Re: Tasks RCU vs Preempt RCU
From: Joel Fernandes
Date: Sun May 20 2018 - 14:23:53 EST
On Sun, May 20, 2018 at 11:28:43AM -0400, Steven Rostedt wrote:
>
> [ Steve interrupts his time off ]
Hope you're enjoying your vacation :)
> On Sat, 19 May 2018 17:49:38 -0700
> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> > I suggested to Steven that the rcu_read_lock() and rcu_read_unlock() might
> > be outside of the trampoline, but this turned out to be infeasible. Not
> > that I remember why! ;-)
>
> Because the trampoline itself is what needs to be freed. The trampoline
> is what mcount/fentry or an optimized kprobe jumps to.
>
>
> <func>:
> nop
>
> [ enable function tracing ]
>
> <func>:
> call func_tramp --> set up stack
> call function_tracer()
> pop stack
> ret
>
> ^^^^^
> This is the trampoline
>
> There's no way to know when a task will be on the trampoline or not.
> The trampoline is allocated, and we need RCU_tasks to know when we can
> free it. The only way to make a "wrapper" is to modify more of the code
> text to do whatever before calling the trampoline, which is
> impractical.
>
> The allocated trampolines were added as an optimization, where two
> registered callback functions from ftrace that are attached to two
> different functions don't call the same trampoline which would have to
> do a loop and a hash lookup to know what callback to call per function.
> If a callback is the only one attached to a specific function, then a
> trampoline is allocated and will call that callback directly, keeping
> the overhead down.
Right, I saw your trampoline prototype tree. I understand how it works now,
thanks.
> There is no feasible way to know when a task is on a trampoline
> without adding overhead that negates the speed up we receive by making
> individual trampolines to begin with.
Are you speaking of time overhead or space overhead, or both?
Just thinking out loud and probably some food for thought..
The rcu_read_lock/unlock primitive are extrememly fast, so I don't personally
think there's a time hit.
Could we get around the trampoline code == data issue by say using a
multi-stage trampoline like so? :
call func_tramp --> (static
trampoline) (dynamic trampoline)
rcu_read_lock() -------> set up stack
call function_tracer()
pop stack
rcu_read_unlock() <------ ret
I know there's probably more to it than this, but conceptually atleast, it
feels like all the RCU infrastructure is already there to handle preemption
within a trampoline and it would be cool if the trampoline were as shown
above for the dynamically allocated trampolines. Atleast I feel it will be
faster than the pre-trampoline code that did the hash lookups / matching to
call the right function callbacks, and could help eliminiate need for the
RCU-tasks subsystem and its kthread then.
If you still feel its nots worth it, then that's okay too and clearly the
RCU-tasks has benefits such as a simpler trampoline implementation..
thanks!
- Joel