[RFC] simple_recursive_removal()

From: Al Viro
Date: Fri Nov 15 2019 - 13:42:51 EST


On Fri, Nov 15, 2019 at 05:54:23PM +0000, Al Viro wrote:
> Anyway, AFAICS removal could be done this way:
>
> // parent is held exclusive, after is NULL or a child of parent
> find_next_child(parent, prev)
> child = NULL
> node = prev ? &prev->d_child : &parent->d_subdirs;
> grab parent->d_lock
> for each entry in the list starting at node->next
> d = container_of(entry, struct dentry, d_child)
> grab d->d_lock
> if simple_positive(d)
> bump d->d_count
> child = d
> drop d->d_lock
> if child
> break
> drop parent->d_lock
> dput(prev);
> return child
>
> kill_it(victim, parent)
> if simple_positive(victim)
> d_invalidate(victim); // needed to avoid lost mounts
> if victim is a directory
> fsnotify_rmdir
> else
> fsnotify_unlink
> if victim is regular
> __debugfs_file_removed(victim)
> dput(victim) // unpin it
>
> recursive_removal(dentry)
> this = dentry;
> while (true) {
> victim = NULL;
> inode = this->d_inode;
> inode_lock(inode);
> if (d_is_dir(this))
> mark this doomed
> while ((child = find_next_child(this, victim)) == NULL) {
> // no children (left); kill and ascend
> // update metadata while it's still locked
> inode->i_ctime = current_time(inode);
> clear_nlink(inode);
> inode_unlock(inode);
> victim = this;
> this = this->d_parent;
> inode = this->d_inode;
> inode_lock(inode);
> kill_it(victim, this);
> if (victim == dentry) {
> inode->i_ctime = inode->i_mtime = current_time(inode);
> if (d_is_dir(dentry))
> drop_nlink(inode);
> inode_unlock(inode);
> dput(dentry);
> return;
> }
> }
> inode_unlock(inode);
> this = child;
> }

Come to think of that, if we use IS_DEADDIR as "no more additions" marking,
that looks like a good candidate for all in-kernel rm -rf on ramfs-style
filesystems without cross-directory renames. This bit in kill_it() above
if victim is regular
__debugfs_file_removed(victim)
would be an fs-specific callback passed by the caller, turning the whole
thing into this:

void simple_recursive_removal(struct dentry *dentry,
void (*callback)(struct dentry *))
{
struct dentry *this = dentry;
while (true) {
struct dentry *victim = NULL, *child;
struct inode *inode = this->d_inode;

inode_lock(inode);
if (d_is_dir(this))
inode->i_flags |= S_DEAD;
while ((child = find_next_child(this, victim)) == NULL) {
// kill and ascend
// update metadata while it's still locked
inode->i_ctime = current_time(inode);
clear_nlink(inode);
inode_unlock(inode);
victim = this;
this = this->d_parent;
inode = this->d_inode;
inode_lock(inode);
if (simple_positive(victim)) {
d_invalidate(victim); // avoid lost mounts
if (is_dir(victim))
fsnotify_rmdir(inode, victim);
else
fsnotify_unlink(inode, victim);
if (callback)
callback(victim);
dput(victim) // unpin it
}
if (victim == dentry) {
inode->i_ctime = inode->i_mtime =
current_time(inode);
if (d_is_dir(dentry))
drop_nlink(inode);
inode_unlock(inode);
dput(dentry);
return;
}
}
inode_unlock(inode);
this = child;
}
}

with find_next_child() easily implemented via scan_positives() already
in libfs.c... Objections? The above is obviously completely untested,
and I've got nowhere near enough sleep, so there may be any number of
brown paperbag bugs in it. Review would be very welcome...