RE: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t

From: Reshetova, Elena
Date: Fri Oct 20 2017 - 08:03:59 EST


Since I am not really sure what to do with this futex patch, I will drop it
from the new series that I am about to send now.

Please let me know what exactly should I do with this patch, as I wrote
previously I really don't understand.

Best Regards,
Elena.

> Sorry for delayed reply.
>
> > On Mon, Sep 04, 2017 at 10:31:54AM +0000, Reshetova, Elena wrote:
> > > > > But can they make "fast" implementation on ARM that would give stronger
> > > > > memory guarantees?
> > > >
> > > > Whatever for?
> > >
> > > Well, maybe just by default when arch.-specific implementation is
> > > done. But I was just trying to speculate to understand. I will resend
> > > this one with new comment added.
> >
> > So the generic lib/refcount.c already has weak ordering. It doesn't make
> > sense for an arch specific implementation (on a weakly ordered machine)
> > to provide stronger guarantees (it would make things slower).
>
> Thank you for explaining this! Helps to understand a lot.
> >
> > The weaker ordering of the refcount_t primitives is sufficient if we're
> > talking pure refcounts. If for some reason code relies on stronger
> > ordering there _SHOULD_ be a comment with describing the additional
> > ordering requirements.
> >
> > But that's a fairly big 'should'. I can well imagine the comment not
> > being there. In fact, see below.
> >
> > > Still not sure if I need to resend the whole series with updated
> > > commits or break this up by individual patches further for the
> > > separate merges.
> >
> > I've yet to look at the ones targeted at subsystems I do, I'm forever
> > and terminally behind on review :/
> >
> > I called out the issue on futex in particular because it is fairly
> > tricky code that.
> >
> > Now Thomas would like you to mention the fact that refcount_t doesn't
> > provide the exact same ordering as the atomic_t usages it replaces and
> > I think it would be good if you could hand-wave an argument on why the
> > futex code doesn't care.
>
> I think I can mention the ordering differences on all yet-to-be-merged
> patches to make sure maintainers are aware. The problem with concrete
> cases is that I don't usually have enough knowledge of code to understand
> for sure where it would matter or not. Previously I was even under impression
> that it should not matter at all for the variables that we are converting since
> they are classical refcounters, but your examples clearly show that it is not
> *always* the case (but I think it is the case for most of patches).
> So, I am a bit confused on how to approach this.
> Either just put a statement to all patches and rely that maintainers certainly
> know their code and can catch these things or do an analysis myself, but
> then I would need a bit of guidance on what is the reasonable heuristics on
> how check each refcounter. This goes really beyond my current
> kernel knowledge, but I am happy to learn if somebody points me to smth
> I can read/fill missing points.
>
> Best Regards,
> Elena.
>
>
> >
> >
> > Now, suppose we were to convert i_count to refcount_t (yes, I know, my
> > initial conversion wasn't well received), then we need to add
> > futex_get_inode() similar to futex_get_mm().
> >
> > That is, smp_mb__{before,after}_atomic() works as expected and can be
> > used to fortify the implied barriers by refcount_t.
> >
> > ---
> > Subject: fs,inode: Add comment explaining additional ordering
> >
> > Add a note to ihold() to document the ordering futex relies upon.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > ---
> > fs/inode.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 50370599e371..17192ba92fef 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -395,6 +395,10 @@ void __iget(struct inode *inode)
> > */
> > void ihold(struct inode *inode)
> > {
> > + /*
> > + * Note: futex.c:get_futex_key_refs() relies on this function
> > + * implying an smp_mb().
> > + */
> > WARN_ON(atomic_inc_return(&inode->i_count) < 2);
> > }
> > EXPORT_SYMBOL(ihold);