Re: There is a Tasks RCU stall warning

From: Steven Rostedt
Date: Wed Apr 12 2017 - 10:43:04 EST


On Wed, 12 Apr 2017 07:19:36 -0700
"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Apr 12, 2017 at 09:18:21AM -0400, Steven Rostedt wrote:
> > On Tue, 11 Apr 2017 20:23:07 -0700
> > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > > But another question...
> > >
> > > Suppose someone traced or probed or whatever a call to (say)
> > > cond_resched_rcu_qs(). Wouldn't that put the call to this
> > > function in the trampoline itself? Of course, if this happened,
> > > life would be hard when the trampoline was freed due to
> > > cond_resched_rcu_qs() being a quiescent state.
> >
> > Not at all, because the trampoline happens at the beginning of the
> > function. Not in the guts of it (unless something in the guts was
> > traced). But even then, it should be fine as the change was already
> > made.
> >
> > /* unhook trampoline from function calls */
> > unregister_ftrace_function(my_ops);
> >
> > synchronize_rcu_tasks();
> >
> > kfree(my_ops->trampoline);
> >
> >
> > Thus, once the unregister_ftrace_function() is called, no new entries
> > into the trampoline can happen. The synchronize_rcu_tasks() is to move
> > those that are currently on a trampoline off.
>
> OK, good! (I thought that these things could appear anywhere.)

Well the trampolines pretty much can, but they are removed before
calling synchronize_rcu_tasks(), and nothing can enter the trampoline
when that is called.

>
> If it ever becomes necessary, I suppose you could have a function
> call as the very last thing on a trampoline. Do the (off-trampoline)
> return-address push, jump at the function, and that is the last need
> for the trampoline.

The point of trampolines is to optimize the function hooks, added
features will kill that optimization. But then it gets even more
complex. The trampolines are written in assembly and do special reg
savings in order to call C code. And it needs to restore back to the
original state before calling back to the function being traced. Thus,
anything at the end of the trampoline will need to be written in
assembly. Not sure writing RCU code in assembly would be much fun.


>
> Assuming that the called function doesn't try accessing the code
> surrounding the call, but that would be a problem in any case.
>
> > Is there a way that a task could be in the middle of
> > cond_resched_rcu_qs() and get preempted by something while on the
> > ftrace trampoline, then the above "unregister_ftrace_function()" and
> > "synchronize_rcu_tasks()" can be called and finish, while the one task
> > is still on the trampoline and never finished the cond_resched_rcu_qs()?
>
> Well, if the kernel being ftraced is a guest OS and the hypervisor
> preempts it at just that point...

Not sure what you mean by the above. You mean the hypervisor running
ftrace on the guest OS? Or just a long pause on the guest OS (could
also be an NMI). But in any case, we don't care about long pauses. We
care about tasks going to sleep while on the trampoline, and the ftrace
code that does the schedule_on_each_cpu() missing that task, because it
was preempted, and not effected by the schedule_on_each_cpu() call.

>
> > > Or is there something that takes care to avoid putting calls to
> > > this sort of function (and calls to any function calling this sort
> > > of function, directly or indirectly) into a trampoline?
> >
> > The question is, if its on the trampoline in one of theses functions
> > when synchronize_rcu_tasks() is called, will it still be on the
> > trampoline when that returns?
>
> If the function's return address is within the trampoline, it seems to
> me that bad things could happen.

Not sure what you mean by the above. One should never be tracing within
a trampoline, or calling synchronize_rcu_tasks() in one. The trampoline
could be called from any context, including NMI.

-- Steve