Re: Use case for TASKS_RCU
From: Ingo Molnar
Date: Fri May 19 2017 - 02:23:44 EST
* Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> On Tue, May 16, 2017 at 08:22:33AM +0200, Ingo Molnar wrote:
> >
> > * Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > > Hello!
> > >
> > > The question of the use case for TASKS_RCU came up, and here is my
> > > understanding. Steve will not be shy about correcting any misconceptions
> > > I might have. ;-)
> > >
> > > The use case is to support freeing of trampolines used in tracing/probing
> > > in CONFIG_PREEMPT=y kernels. It is necessary to wait until any task
> > > executing in the trampoline in question has left it, taking into account
> > > that the trampoline's code might be interrupted and preempted. However,
> > > the code in the trampolines is guaranteed never to context switch.
> > >
> > > Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices.
> > > It is therefore tempting to think in terms of disabling preemption across
> > > the trampolines, but there is apparently not enough room to accommodate
> > > the needed preempt_disable() and preempt_enable() in the code invoking
> > > the trampoline, and putting the preempt_disable() and preempt_enable()
> > > in the trampoline itself fails because of the possibility of preemption
> > > just before the preempt_disable() and just after the preempt_enable().
> > > Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock().
> >
> > So how was this solved before TASKS_RCU? Also, nothing uses call_rcu_tasks() at
> > the moment, so it's hard for me to review its users. What am I missing?
>
> Before TASKS_RCU, the trampolines were just leaked when CONFIG_PREEMPT=y.
>
> Current mainline kernel/trace/ftrace.c uses synchronize_rcu_tasks().
> So yes, currently one user.
So why not schedule a worklet on every CPU to drive the trampoline freeing? To
guarantee that nothing was preempted it could run at SCHED_IDLE and could observe
nr_running from the worklet and use a short timeout loop. Batching and hysteresis
would ensure that this is only running rarely in practice.
It doesn't have to be fast or particularly elegant, but it could use existing
kernel facilites just fine: it's a corner case cost and quirk of our live kernel
text modifying trampoline code and our current CONFIG_PREEMPT=y model.
I.e. don't make it an RCU facility that complicates not just the RCU code but has
various costs in generic code as well:
kernel/exit.c: TASKS_RCU(int tasks_rcu_i);
kernel/exit.c: TASKS_RCU(preempt_disable());
kernel/exit.c: TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu));
kernel/exit.c: TASKS_RCU(preempt_enable());
kernel/exit.c: TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
I.e. I question that this should be a generic RCU facility.
Thanks,
Ingo