Re: processes hung after sys_renameat, and 'missing' processes

From: Al Viro
Date: Sun Jun 03 2012 - 19:29:00 EST


On Mon, Jun 04, 2012 at 12:17:09AM +0100, Al Viro wrote:
>
> > Also, sysrq-w is usually way more interesting than 't' when there are
> > processes stuck on a mutex.
> >
> > Because yes, it looks like you have a boattload of trinity processes
> > stuck on an inode mutex. Looks like every single one of them is in
> > 'lock_rename()'. It *shouldn't* be an ABBA deadlock, since lockdep
> > should have noticed that, but who knows.
>
> lock_rename() is a bit of a red herring here - they appear to be all
> within-directory renames, so it's just a "trying to rename something
> in a directory that has ->i_mutex held by something else".
>
> IOW, something else in there is holding ->i_mutex - something that
> either hadn't been through lock_rename() at all or has already
> passed through it and still hadn't got around to unlock_rename().
> In either case, suspects won't have lock_rename() in the trace...

Everything in lock_rename() appears to be at lock_rename+0x3e. Unless
there's a really huge amount of filesystems on that box, this has to
be
mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
and everything on that sucker is not holding any locks yet. IOW, that's
the tail hanging off whatever deadlock is there.

One possibility is that something has left the kernel without releasing
i_mutex on some directory, which would make atomic_open patches the most
obvious suspects.

Which kernel it is and what filesystems are there? Is there nfsd anywhere
in the mix?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/