Re: [PATCH] Busy inodes after unmount, be more verbose in generic_shutdown_super

From: Jan Blunck
Date: Fri Mar 03 2006 - 06:40:17 EST


On Thu, Mar 02, Neil Brown wrote:

> The core problem is that:
> prune_one_dentry can hold a reference to a dentry without any
> lock being held, and without any other reference to the
> filesystem (if it is being called from shrink_dcache_memory).
> It holds this reference while calling iput on an inode. This can
> take an arbitrarily long time to complete, especially if NFS
> needs to wait for some RPCs to complete or timeout.
>
> shrink_dcache_parent skips over dentries which have a reference,
> such as the one held by prune_one_dentry.
>
> Thus umount can find that an inode is still in use (by it's dentry
> which was skipped) and will complain. Worse, when the nfs request
> on some inode finally completes, it might find the superblock
> doesn't exist any more and... oops.
>
> My proposed solution to the problem is never to expose the reference
> held by prune_one_dentry. i.e. keep the spin_lock held.

This morning I wondered, why I was using a list to drop the dentry's inodes
after the dput() has visited all parents.

It is not enough to fix prune_one_dentry() to hold the lock until the parent
is dereferenced since prune_one_dentry() calls __dput_locked(). And in
__dput_locked() we have the same problem again:

from __dput_locked():

list_del(&dentry->d_u.d_child);
dentry_stat.nr_dentry--; /* For d_free, below */
/*drops the locks, at that point nobody can reach this dentry*/
-> dentry_iput(dentry);
parent = dentry->d_parent;
d_free(dentry);
if (dentry == parent)
return;
dentry = parent;
+
+ if (atomic_read(&dentry->d_count) == 1)
+ might_sleep();
+ if (!atomic_dec_and_lock(&dentry->d_count, &dcache_lock))
+ return;
+
+ spin_lock(&dentry->d_lock);
goto repeat;
}
}

Between -> and the atomic_dec_and_lock() the reference count on the parent is
wrong and no lock is held. I fixed that by using d_lru to keep track of all
dentry's which inodes still have to be dereferenced. This should happen after
all parents have been dereferenced.

Regards,
Jan

--
Jan Blunck jblunck@xxxxxxx
SuSE LINUX AG - A Novell company
Maxfeldstr. 5 +49-911-74053-608
D-90409 Nürnberg http://www.suse.de
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/