Re: [RFC] dynticks: dynticks_idle is only modified locally use this_cpu ops

From: Paul E. McKenney
Date: Thu Sep 04 2014 - 12:16:40 EST

On Thu, Sep 04, 2014 at 10:04:17AM -0500, Christoph Lameter wrote:
> On Wed, 3 Sep 2014, Paul E. McKenney wrote:
> > As noted earlier, in theory, the atomic operations could be nonatomic,
> Well as demonstrated by the patch earlier: The atomic operations are only
> used on a the local cpu. There is no synchronization in that sense needed
> between processors because there is never a remote atomic operation.

Easy to say! ;-)

> > > The code looks fragile and bound to have issues in the future given the
> > > barriers/atomics etc. Its going to be cleaner without that.
> >
> > What exactly looks fragile about it, and exactly what issues do you
> > anticipate?
> I am concerned about creation of unecessary synchronization issues. In
> this case we have already discovered that the atomic operations on per
> cpu variables are only used to modify the contents from the local cpu.
> This means at minimum we can give up on the use of atomics and keep the
> barriers to enforce visibility.

Sounds like a desire for a potential optimization rather than any
sort of fragility. And in this case, it is not clear that your desire
of replacing a value-returning atomic operation with a normal memory
reference and a pair of memory barrier actually makes anything go faster.

So in short, you don't see the potential for this use case actually
breaking anything, correct?

Besides RCU is not the only place where atomics are used on per-CPU
variables. For one thing, there are a number of per-CPU spinlocks in use
in various places throughout the kernel. For another thing, there is also
a large number of per-CPU structures (not pointers to structures, actual
structures), and I bet that a fair number of these feature cross-CPU
writes and cross-CPU atomics. RCU's rcu_data structures certainly do.

> > > And we are right now focusing on the simplest case. The atomics scheme is
> > > used multiple times in the RCU subsystem. There is more weird looking code
> > > there like atomic_add using zero etc.
> >
> > The atomic_add_return(0,...) reads the value out, forcing full ordering.
> > Again, in theory, this could be a volatile read with explicit memory-barrier
> > instructions on either side, but it is not clear which wins. (Keep in
> > mind that almost all of the atomic_add_return(0,...) calls for a given
> > dynticks counter are executed from a single kthread.)
> >
> > If systems continue to add CPUs like they have over the past decade or
> > so, I expect that you will be seeing more code like RCU's, not less.
> We have other code like this in multiple subsystems but it does not have
> the barrier issues, per cpu variables are updated always without the use
> of atomics and the inspection of the per cpu state from remote cpus works
> just fine also without them.

Including the per-CPU spinlocks? That seems a bit unlikely. And again,
I expect that a fair number of the per-CPU structures involve cross-CPU

> I'd like to simplify this as much as possible and make it consistent
> throughout the kernel.

It already is consistent, just not in the manner that you want. ;-)

But -why- do you want these restrictions? How does it help anything?

Thanx, Paul

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at