Re: [PATCH RFC tip/core/rcu 11/11] rcu: move TREE_RCU from softirqto kthread
From: Frederic Weisbecker
Date: Wed Feb 23 2011 - 12:34:26 EST
On Wed, Feb 23, 2011 at 12:03:56PM -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> > On Wed, 2011-02-23 at 17:16 +0100, Frederic Weisbecker wrote:
> > > On Tue, Feb 22, 2011 at 05:39:40PM -0800, Paul E. McKenney wrote:
> > > > +/*
> > > > + * Wake up the current CPU's kthread. This replaces raise_softirq()
> > > > + * in earlier versions of RCU.
> > > > + */
> > > > +static void invoke_rcu_kthread(void)
> > > > +{
> > > > + unsigned long flags;
> > > > + wait_queue_head_t *q;
> > > > + int cpu;
> > > > +
> > > > + local_irq_save(flags);
> > > > + cpu = smp_processor_id();
> > > > + if (per_cpu(rcu_cpu_kthread_task, cpu) == NULL) {
> > > > + local_irq_restore(flags);
> > > > + return;
> > > > + }
> > > > + per_cpu(rcu_cpu_has_work, cpu) = 1;
> > > > + q = &per_cpu(rcu_cpu_wq, cpu);
> > >
> > > I see you make extensive use of per_cpu() accessors even for
> > > local variables.
> > >
> > > I tend to think it's better to use __get_cpu_var() for local
> > > accesses and keep per_cpu() for remote accesses.
> > >
> > > There are several reasons for that:
> > >
> > > * __get_cpu_var() checks we are in a non-preemptible section,
> > > per_cpu() doesn't. That may sound of a limited interest for code like the
> > > above, but by the time code can move, and then we might lose track of some
> > > things, etc...
> >
> > Ah, but so does smp_processor_id() ;-)
> >
> > >
> > > * local accesses can be optimized by architectures. per_cpu() implies
> > > finding the local cpu number, and dereference an array of cpu offsets with
> > > that number to find the local cpu offset.
> > > __get_cpu_var() does a direct access to __my_cpu_offset which is a nice
> > > shortcut if the arch implements it.
>
> [Adding Christoph Lameter to CC list]
>
> This is not quite true on x86_64 and s390 anymore. __get_cpu_var() now
> uses a segment selector override to get the local CPU variable on x86.
> See x86's percpu.h for details.
>
> So even performance-wise, using __get_cpu_var() over per_cpu() should be
> a win on widely used architectures nowadays,
Looking at x86_64, it indeed optimizes further by overriding this_cpu_ptr().
It does the same than the generic this_cpu_ptr() on an
overriden my_cpu_offset, but it also economizes a temporary store.
>
> >
> > True, but we could also argue that the multiple checks for being preempt
> > can also be an issue.
>
> At least on x86 preemption don't actually need to be disabled: selection
> of the right per-cpu memory location is done atomically with the rest of
> the instruction by the segment selector.
It depends on the case, you may still need to disable preemption if you use
your variable further than just a quick op, which is often the case.
That's up to this_cpu_add() op things, depending on what the arch is capable
of wrt. local atomicity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/