Re: [PATCHv5 2/2] memory barrier: adding smp_mb__after_lock

From: Mathieu Desnoyers
Date: Fri Jul 03 2009 - 13:32:06 EST


* Paul E. McKenney (paulmck@xxxxxxxxxxxxxxxxxx) wrote:
> On Fri, Jul 03, 2009 at 11:47:00AM -0400, Mathieu Desnoyers wrote:
> > * Eric Dumazet (eric.dumazet@xxxxxxxxx) wrote:
> > > Herbert Xu a écrit :
> > > > Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx> wrote:
> > > >> Why don't we create a read_lock without acquire semantic instead (e.g.
> > > >> read_lock_nomb(), or something with a better name like __read_lock()) ?
> > > >> On architectures where memory barriers are needed to provide the acquire
> > > >> semantic, it would be faster to do :
> > > >>
> > > >> __read_lock();
> > > >> smp_mb();
> > > >>
> > > >> than :
> > > >>
> > > >> read_lock(); <- e.g. lwsync + isync or something like that
> > > >> smp_mb(); <- full sync.
> > > >
> > > > Hmm, why do we even care when read_lock should just die?
> > > >
> > > > Cheers,
> > >
> > > +1 :)
> > >
> > > Do you mean using a spinlock instead or what ?
> > >
> >
> > I think he meant RCU.
> >
> > > Also, how many arches are able to have a true __read_lock()
> > > (or __spin_lock() if that matters), without acquire semantic ?
> >
> > At least PowerPC, MIPS, recent ARM, alpha.
>
> Are you guys sure you are in agreement about what you all mean by
> "acquire semantics"?
>

I use acquire/release semantic with the following meaning :

...
read A
read_unlock()

read B

read_lock();
read C

read_unlock would provide release semantic by disallowing read A to move
after the read_unlock.

read_lock would provide acquire semantic by disallowing read C to move
before read_lock.

read B is free to move.


> Clearly, any correct __read_lock() implementation must enforce ordering
> with respect to the most recent __write_unlock(), but this does not
> necesarily imply all possible definitions of "acquire semantics".
>

Yes, you are right. We could never remove _all_ memory barriers from
__read_lock()/__read_unlock implementations even if we require something
such as :

__read_lock()
smp_mb()

critical section.

smp_mb()
__read_unlock()

Because we also need to guarantee that consecutive unlock/lock won't be
reordered, which implies a barrier _outside_ of the read lock/unlock
atomic operations.

But anyway I'm not sure it's worth trying to optimize rwlocks, given
that for critical sections where the performance hit of a memory barrier
would be perceivable, we should really think about using RCU rather than
beating this dead horse. :)

Thanks,

Mathieu.


> Thanx, Paul

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/