Re: memory barrier question

From: James Bottomley
Date: Sun Sep 19 2010 - 20:59:01 EST


On Sun, 2010-09-19 at 14:59 -0700, Paul E. McKenney wrote:
> On Sun, Sep 19, 2010 at 10:15:51PM +0200, Miklos Szeredi wrote:
> > On Sun, 19 Sep 2010, Paul E. McKenney wrote:
> > > Give it a few years. There are reportedly already other compilers that do
> > > this, which is not too surprising given that the perception of insanity
> > > is limited to lockless parallel code. If you have single-threaded code,
> > > such as code and data under a lock (where the data is never accessed
> > > without holding that lock), then this sort of optimization is pretty safe.
> > > I still don't like it, but the compiler guys would argue that this is
> > > because I am one of those insane parallel-programming guys.
> > >
> > > Furthermore, there are other ways to get into trouble. If the code
> > > continued as follows:
> > >
> > > LOAD inode = next.dentry->inode
> > > if (inode != NULL)
> > > LOAD inode->f_op
> > > do_something_using_lots_of_registers();
> > > LOAD inode->some_other_field
> > >
> > > and if the code expected ->f_op and ->some_other_field to be from the
> > > same inode structure, severe disappointment could ensue. This is because
> > > the compiler is within its rights to reload from next.dentry->inode,
> > > especially given register pressure. In fact, the compiler would be within
> > > its rights to reload from next.dentry->inode in the "LOAD inode->f_op"
> > > statement. And it might well get NULL from such a reload.
> >
> > Except the VFS doesn't allow that. dentry->inode can go from NULL to
> > non-NULL anytime but will only go from non-NULL to NULL when there are
> > no possible external references to the dentry.
> >
> > The compiler and the CPU cannot move the "LOAD inode->some_field"
> > before the "LOAD dentry->inode", because of the conditional, right?
>
> Other than Alpha, the CPU cannot. The standard -does- permit the
> compiler to guess the value of the pointer, thus effectively moving the
> load prior to the conditional. At present, as far as I know, gcc does
> not actually do this.

I believe the standard only allows this if the compiler can *prove* (not
guess) the result of the if test. Even on Parisc where this can be
expressed as a nullified instruction pair, the architectural rules
forbid speculation of the latter before the former (the only time they
can be executed out of order is when the CPU can prove the values
internally, i.e. they're both cached)

> Again, please put at least an ACCESS_ONCE() in. Trivial to do now,
> possibly saving much pain and headache later on.

OK, lost you here. ACCESS_ONCE() is only needed in certain situations
(like list traversal) because some compilers can reload cached values
across an explicit barrier (which isn't here).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/