Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

From: Mika Westerberg
Date: Fri May 30 2014 - 04:13:03 EST


On Thu, May 29, 2014 at 07:52:01PM +0100, Al Viro wrote:
> On Thu, May 29, 2014 at 05:53:51PM +0100, Al Viro wrote:
> > On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote:
> > > On Thu, May 29, 2014 at 9:23 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > BTW, lock_parent() might be better off if in contended case it would not
> > > > bother with rename_lock and did something like this:
> > > > again:
> > >
> > > Ack. I think that's much better.
> >
> > Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry),
> > not if (parent)). I'll wait with folding it back into the commit that
> > introduces lock_parent() until we get testing results...
>
> Grrr... Sadly, that's not good enough. Leaking rcu_read_lock() on
> success is trivial, but there's more serious problem: suppose dentries
> involved get moved before we get to locking what we thought was parent.
> We end up taking ->d_lock on two dentries that might be nowhere near each
> other in the tree, with obvious nasty implications. Would be _very_ hard
> to reproduce ;-/
>
> AFAICS, the following would be safe, but I'd really appreciate any extra
> eyes on that sucker:
>
> static inline struct dentry *lock_parent(struct dentry *dentry)
> {
> struct dentry *parent = dentry->d_parent;
> if (IS_ROOT(dentry))
> return NULL;
> if (likely(spin_trylock(&parent->d_lock)))
> return parent;
> spin_unlock(&dentry->d_lock);
> rcu_read_lock();
> again:
> parent = ACCESS_ONCE(dentry->d_parent);
> spin_lock(&parent->d_lock);
> /*
> * We can't blindly lock dentry until we are sure
> * that we won't violate the locking order.
> * While parent->d_lock is not enough to stabilize
> * dentry->d_parent, it *is* enough to stabilize
> * dentry->d_parent == parent.
> */
> if (unlikely(parent != dentry->d_parent)) {
> spin_unlock(&parent->d_lock);
> goto again;
> }
> rcu_read_unlock();
> if (parent != dentry)
> spin_lock(&dentry->d_lock);
> else
> parent = NULL;
> return parent;
> }
>
> That variant got force-pushed in place of the previous one, again at the
> head of #for-linus. And I'm definitely not folding it in until it gets
> more review and testing.

Tested your latest #for-linus from here:

https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-linus

and the livelock is gone,

Tested-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>

Thanks again!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/