Re: fs: lockup on rename_mutex in fs/dcache.c:1035
From: Al Viro
Date: Sat Oct 25 2014 - 23:51:27 EST
On Sun, Oct 26, 2014 at 03:06:08AM +0000, Al Viro wrote:
> > >From a quick reading of the code it simply isn't possible for d_walk to
> > take the lock twice short of memory corruption. And the fact that the
> > code has not changed in years seems to suggest it isn't the obvious
> > cause of d_walk talking the rename_lock twice.
>
> It is a fairly obvious case of d_walk() forgetting to drop rename_lock.
> See upthread for analysis and (hopefully) a fix.
... except that it's not a full fix. If we get there that way with
retry being true, we will immediately deadlock at again:...
Which might very well has happened in this case - i.e. it might be
a single call of d_walk() taking the sucker twice that way.
Hmm... Actually, the comment in there is simply wrong - if the child
got killed between unlocking the child and locking the parent, it's
not ascending to the wrong parent, it's having no way to find the next
sibling.
OK, so basically it came from Nick's "fs: dcache avoid starvation in dcache
multi-step operations" and mistake was in the assumption that once we
hold rename_lock, nothing is going to need rename_retry. Which isn't
true - dentry_kill() on child while we are trying to get ->d_lock on
parent requires a restart and that isn't excluded by rename_lock at
all.
Well, brute-force fix would be this, but I wonder if it's going to
create livelocks...
diff --git a/fs/dcache.c b/fs/dcache.c
index 3ffef7f..e3d8499 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1118,6 +1118,7 @@ out_unlock:
return;
rename_retry:
+ done_seqretry(&rename_lock, seq);
if (!retry)
return;
seq = 1;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/