Re: [PATCH v3 19/19] fs: handle inode->i_version more efficiently
From: Jeff Layton
Date: Thu Dec 21 2017 - 06:26:09 EST
On Wed, 2017-12-20 at 17:41 +0100, Jan Kara wrote:
> On Wed 20-12-17 09:03:06, Jeff Layton wrote:
> > On Tue, 2017-12-19 at 09:07 +1100, Dave Chinner wrote:
> > > On Mon, Dec 18, 2017 at 02:35:20PM -0500, Jeff Layton wrote:
> > > > [PATCH] SQUASH: add memory barriers around i_version accesses
> > >
> > > Why explicit memory barriers rather than annotating the operations
> > > with the required semantics and getting the barriers the arch
> > > requires automatically? I suspect this should be using
> > > atomic_read_acquire() and atomic_cmpxchg_release(), because AFAICT
> > > the atomic_cmpxchg needs to have release semantics to match the
> > > acquire semantics needed for the load of the current value.
> > >
> > > From include/linux/atomics.h:
> > >
> > > * For compound atomics performing both a load and a store, ACQUIRE
> > > * semantics apply only to the load and RELEASE semantics only to the
> > > * store portion of the operation. Note that a failed cmpxchg_acquire
> > > * does -not- imply any memory ordering constraints.
> > >
> > > Memory barriers hurt my brain. :/
> > >
> > > At minimum, shouldn't the atomic op specific barriers be used rather
> > > than full memory barriers? i.e:
> > >
> >
> > They hurt my brain too. Yes, definitely atomic-specific barriers should
> > be used here instead, since this is an atomic64_t now.
> >
> > After going over the docs again...my understanding has always been that
> > you primarily need memory barriers to order accesses to different areas
> > of memory.
>
> That is correct.
>
> > As Jan and I were discussing in the other thread, i_version is not
> > synchronized with anything else. In this code, we're only dealing with a
> > single 64-bit word. I don't think there are any races there wrt the API
> > itself.
>
> There are not but it is like saying that lock implementation is correct
> because the lock state does not get corrupted ;). Who cares about protected
> data...
>
> > The "legacy" inode_inc_iversion() however _does_ have implied memory
> > barriers from the i_lock. There could be some subtle de-facto ordering
> > there, so I think we probably do want some barriers in here if only to
> > preserve that. It's not likely to cost much, and may save us tracking
> > down some fiddly bugs.
> >
> > What about this patch? Note that I've only added barriers to
> > inode_maybe_inc_iversion. I don't see that we need it for the other
> > functions, but please do tell me if I'm wrong there:
> >
> > --------------------8<---------------------------
> >
> > [PATCH] SQUASH: add memory barriers around i_version accesses
> >
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> > include/linux/iversion.h | 53 +++++++++++++++++++++++++++++-------------------
> > 1 file changed, 32 insertions(+), 21 deletions(-)
> >
> > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > index a9fbf99709df..02187a3bec3b 100644
> > --- a/include/linux/iversion.h
> > +++ b/include/linux/iversion.h
> > @@ -89,6 +89,23 @@ inode_set_iversion_raw(struct inode *inode, const u64 val)
> > atomic64_set(&inode->i_version, val);
> > }
> >
> > +/**
> > + * inode_peek_iversion_raw - grab a "raw" iversion value
> > + * @inode: inode from which i_version should be read
> > + *
> > + * Grab a "raw" inode->i_version value and return it. The i_version is not
> > + * flagged or converted in any way. This is mostly used to access a self-managed
> > + * i_version.
> > + *
> > + * With those filesystems, we want to treat the i_version as an entirely
> > + * opaque value.
> > + */
> > +static inline u64
> > +inode_peek_iversion_raw(const struct inode *inode)
> > +{
> > + return atomic64_read(&inode->i_version);
> > +}
> > +
> > /**
> > * inode_set_iversion - set i_version to a particular value
> > * @inode: inode to set
> > @@ -152,7 +169,16 @@ inode_maybe_inc_iversion(struct inode *inode, bool force)
> > {
> > u64 cur, old, new;
> >
> > - cur = (u64)atomic64_read(&inode->i_version);
> > + /*
> > + * The i_version field is not strictly ordered with any other inode
> > + * information, but the legacy inode_inc_iversion code used a spinlock
> > + * to serialize increments.
> > + *
> > + * This code adds full memory barriers to ensure that any de-facto
> > + * ordering with other info is preserved.
> > + */
> > + smp_mb__before_atomic();
>
> This should be just smp_mb(). __before_atomic() pairs with atomic
> operations like atomic_inc(). atomic_read() is completely unordered
> operation (happens to be plain memory read on x86) and so __before_atomic()
> is not enough.
>
> > + cur = inode_peek_iversion_raw(inode);
> > for (;;) {
> > /* If flag is clear then we needn't do anything */
> > if (!force && !(cur & I_VERSION_QUERIED))
> > @@ -162,8 +188,10 @@ inode_maybe_inc_iversion(struct inode *inode, bool force)
> > new = (cur & ~I_VERSION_QUERIED) + I_VERSION_INCREMENT;
> >
> > old = atomic64_cmpxchg(&inode->i_version, cur, new);
> > - if (likely(old == cur))
> > + if (likely(old == cur)) {
> > + smp_mb__after_atomic();
>
> I don't think you need this. Cmpxchg is guaranteed to be full memory
> barrier - from Documentation/atomic_t.txt:
> - RMW operations that have a return value are fully ordered;
>
> > break;
> > + }
> > cur = old;
> > }
> > return true;
>
> ...
>
> > @@ -248,7 +259,7 @@ inode_query_iversion(struct inode *inode)
> > {
> > u64 cur, old, new;
> >
> > - cur = atomic64_read(&inode->i_version);
> > + cur = inode_peek_iversion_raw(inode);
> > for (;;) {
> > /* If flag is already set, then no need to swap */
> > if (cur & I_VERSION_QUERIED)
>
> And here I'd expect smp_mb() after inode_peek_iversion_raw() (actually be
> needed only if you are not going to do cmpxchg as that implies barrier as
> well). "Safe" use of i_version would be:
>
> Update:
>
> modify inode
> inode_maybe_inc_iversion(inode)
>
> Read:
>
> my_version = inode_query_iversion(inode)
> get inode data
>
> And you need to make sure 'get inode data' does not get speculatively
> evaluated before you actually sample i_version so that you are guaranteed
> that if data changes, you will observe larger i_version in the future.
>
> Also please add a comment smp_mb() in inode_maybe_inc_iversion() like:
>
> /* This barrier pairs with the barrier in inode_query_iversion() */
>
> and a similar comment to inode_query_iversion(). Because memory barriers
> make sense only in pairs (see SMP BARRIER PAIRING in
> Documentation/memory-barriers.txt).
>
Got it, I think. How about this (sorry for the unrelated deltas here):
[PATCH] SQUASH: add memory barriers around i_version accesses
Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
include/linux/iversion.h | 60 +++++++++++++++++++++++++++++++-----------------
1 file changed, 39 insertions(+), 21 deletions(-)
diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index a9fbf99709df..1b3b5ed7c5b8 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -89,6 +89,23 @@ inode_set_iversion_raw(struct inode *inode, const u64 val)
atomic64_set(&inode->i_version, val);
}
+/**
+ * inode_peek_iversion_raw - grab a "raw" iversion value
+ * @inode: inode from which i_version should be read
+ *
+ * Grab a "raw" inode->i_version value and return it. The i_version is not
+ * flagged or converted in any way. This is mostly used to access a self-managed
+ * i_version.
+ *
+ * With those filesystems, we want to treat the i_version as an entirely
+ * opaque value.
+ */
+static inline u64
+inode_peek_iversion_raw(const struct inode *inode)
+{
+ return atomic64_read(&inode->i_version);
+}
+
/**
* inode_set_iversion - set i_version to a particular value
* @inode: inode to set
@@ -152,7 +169,18 @@ inode_maybe_inc_iversion(struct inode *inode, bool force)
{
u64 cur, old, new;
- cur = (u64)atomic64_read(&inode->i_version);
+ /*
+ * The i_version field is not strictly ordered with any other inode
+ * information, but the legacy inode_inc_iversion code used a spinlock
+ * to serialize increments.
+ *
+ * Here, we add full memory barriers to ensure that any de-facto
+ * ordering with other info is preserved.
+ *
+ * This barrier pairs with the barrier in inode_query_iversion()
+ */
+ smp_mb();
+ cur = inode_peek_iversion_raw(inode);
for (;;) {
/* If flag is clear then we needn't do anything */
if (!force && !(cur & I_VERSION_QUERIED))
@@ -183,23 +211,6 @@ inode_inc_iversion(struct inode *inode)
inode_maybe_inc_iversion(inode, true);
}
-/**
- * inode_peek_iversion_raw - grab a "raw" iversion value
- * @inode: inode from which i_version should be read
- *
- * Grab a "raw" inode->i_version value and return it. The i_version is not
- * flagged or converted in any way. This is mostly used to access a self-managed
- * i_version.
- *
- * With those filesystems, we want to treat the i_version as an entirely
- * opaque value.
- */
-static inline u64
-inode_peek_iversion_raw(const struct inode *inode)
-{
- return atomic64_read(&inode->i_version);
-}
-
/**
* inode_iversion_need_inc - is the i_version in need of being incremented?
* @inode: inode to check
@@ -248,15 +259,22 @@ inode_query_iversion(struct inode *inode)
{
u64 cur, old, new;
- cur = atomic64_read(&inode->i_version);
+ cur = inode_peek_iversion_raw(inode);
for (;;) {
/* If flag is already set, then no need to swap */
- if (cur & I_VERSION_QUERIED)
+ if (cur & I_VERSION_QUERIED) {
+ /*
+ * This barrier (and the implicit barrier in the
+ * cmpxchg below) pairs with the barrier in
+ * inode_maybe_inc_iversion().
+ */
+ smp_mb();
break;
+ }
new = cur | I_VERSION_QUERIED;
old = atomic64_cmpxchg(&inode->i_version, cur, new);
- if (old == cur)
+ if (likely(old == cur))
break;
cur = old;
}
--
2.14.3