Re: [PATCH RFC tip/core/rcu 11/11] rcu: move TREE_RCU from softirqto kthread

From: Mathieu Desnoyers
Date: Wed Feb 23 2011 - 12:04:08 EST


* Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> On Wed, 2011-02-23 at 17:16 +0100, Frederic Weisbecker wrote:
> > On Tue, Feb 22, 2011 at 05:39:40PM -0800, Paul E. McKenney wrote:
> > > +/*
> > > + * Wake up the current CPU's kthread. This replaces raise_softirq()
> > > + * in earlier versions of RCU.
> > > + */
> > > +static void invoke_rcu_kthread(void)
> > > +{
> > > + unsigned long flags;
> > > + wait_queue_head_t *q;
> > > + int cpu;
> > > +
> > > + local_irq_save(flags);
> > > + cpu = smp_processor_id();
> > > + if (per_cpu(rcu_cpu_kthread_task, cpu) == NULL) {
> > > + local_irq_restore(flags);
> > > + return;
> > > + }
> > > + per_cpu(rcu_cpu_has_work, cpu) = 1;
> > > + q = &per_cpu(rcu_cpu_wq, cpu);
> >
> > I see you make extensive use of per_cpu() accessors even for
> > local variables.
> >
> > I tend to think it's better to use __get_cpu_var() for local
> > accesses and keep per_cpu() for remote accesses.
> >
> > There are several reasons for that:
> >
> > * __get_cpu_var() checks we are in a non-preemptible section,
> > per_cpu() doesn't. That may sound of a limited interest for code like the
> > above, but by the time code can move, and then we might lose track of some
> > things, etc...
>
> Ah, but so does smp_processor_id() ;-)
>
> >
> > * local accesses can be optimized by architectures. per_cpu() implies
> > finding the local cpu number, and dereference an array of cpu offsets with
> > that number to find the local cpu offset.
> > __get_cpu_var() does a direct access to __my_cpu_offset which is a nice
> > shortcut if the arch implements it.

[Adding Christoph Lameter to CC list]

This is not quite true on x86_64 and s390 anymore. __get_cpu_var() now
uses a segment selector override to get the local CPU variable on x86.
See x86's percpu.h for details.

So even performance-wise, using __get_cpu_var() over per_cpu() should be
a win on widely used architectures nowadays, thanks to Christoph's work
on this_cpu accessors.

>
> True, but we could also argue that the multiple checks for being preempt
> can also be an issue.

At least on x86 preemption don't actually need to be disabled: selection
of the right per-cpu memory location is done atomically with the rest of
the instruction by the segment selector.

>
> >
> > * It makes code easier to review: we know that __get_cpu_var() is
> > for local accesses and per_cpu() for remote.
>
> This I'll agree with you.
>
> In the past, I've thought about which one is better (per_cpu() vs
> __get_cpu_var()).
>
> But, that last point is a good one. Just knowing that this is for the
> local CPU helps with the amount of info your mind needs to process when
> looking at this code. And the mind needs all the help it can get when
> reviewing this code ;-)
>

Agreed, better documentation of the code is also a win.

Thanks,

Mathieu

> -- Steve
>
> >
> > > + wake_up(q);
> > > + local_irq_restore(flags);
> > > +}
>
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/