Re: [PATCH RFC tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

From: Paul E. McKenney
Date: Tue Jul 29 2014 - 12:33:24 EST


On Tue, Jul 29, 2014 at 06:07:54PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 29, 2014 at 08:57:47AM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 29, 2014 at 09:50:55AM +0200, Peter Zijlstra wrote:
> > > On Mon, Jul 28, 2014 at 03:56:12PM -0700, Paul E. McKenney wrote:
> > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > > index bc1638b33449..a0d2f3a03566 100644
> > > > --- a/kernel/sched/core.c
> > > > +++ b/kernel/sched/core.c
> > > > @@ -2762,6 +2762,7 @@ need_resched:
> > > > } else {
> > > > deactivate_task(rq, prev, DEQUEUE_SLEEP);
> > > > prev->on_rq = 0;
> > > > + rcu_note_voluntary_context_switch(prev);
> > > >
> > > > /*
> > > > * If a worker went to sleep, notify and ask workqueue
> > > > @@ -2828,6 +2829,7 @@ asmlinkage __visible void __sched schedule(void)
> > > > struct task_struct *tsk = current;
> > > >
> > > > sched_submit_work(tsk);
> > > > + rcu_note_voluntary_context_switch(tsk);
> > > > __schedule();
> > > > }
> > >
> > > Yeah, not entirely happy with that, you add two calls into one of the
> > > hotest paths of the kernel.
> >
> > I did look into leveraging counters, but cannot remember why I decided
> > that this was a bad idea. I guess it is time to recheck...
> >
> > The ->nvcsw field in the task_struct structure looks promising:
> >
> > o Looks like it does in fact get incremented in __schedule() via
> > the switch_count pointer.
> >
> > o Looks like it is unconditionally compiled in.
> >
> > o There are no memory barriers, but a synchronize_sched()
> > should take care of that, given that this counter is
> > incremented with interrupts disabled.
>
> Well, there's obviously the actual context switch, which should imply an
> actual MB such that tasks are self ordered even when execution continues
> on another cpu etc..

True enough, except that it appears that the context switch happens
after the ->nvcsw increment, which means that it doesn't help RCU-tasks
guarantee that if it has seen the increment, then all prior processing
has completed. There might be enough stuff prior the increment, but I
don't see anything that I feel comfortable relying on. Am I missing
some ordering?

> > So I should be able to snapshot the task_struct structure's ->nvcsw
> > field and avoid the added code in the fastpaths.
> >
> > Seem plausible, or am I confused about the role of ->nvcsw?
>
> Nope, that's the 'I scheduled to go to sleep' counter.

I am assuming that the "Nope" goes with "am I confused" rather than
"Seem plausible" -- if not, please let me know. ;-)

> There is of course the 'polling' issue I raised in a further email...

Yep, and other flavors of RCU go to lengths to avoid scanning the
task_struct lists. Steven said that updates will be rare and that it
is OK for them to have high latency and overhead. Thus far, I am taking
him at his word. ;-)

I considered interrupting the task_struct polling loop periodically,
and would add that if needed. That said, this requires nailing down the
task_struct at which the vacation is taken. Here "nailing down" does not
simply mean "prevent from being freed", but rather "prevent from being
removed from the lists traversed by do_each_thread/while_each_thread."

Of course, if there is some easy way of doing this, please let me know!

> > > And I'm still not entirely sure why, your 0/x babbled something about
> > > trampolines, but I'm not sure I understand how those lead to this.
> >
> > Steven Rostedt sent an email recently giving more detail. And of course
> > now I am having trouble finding it. Maybe he will take pity on us and
> > send along a pointer to it. ;-)
>
> Yah, would make good Changelog material that ;-)

;-) ;-) ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/