Re: [RESEND PATCH v7 7/8] kernfs: Replace per-fs rwsem with hashed rwsems.

From: Imran Khan
Date: Sun Mar 20 2022 - 21:57:31 EST


Hello Al,
Thanks again for reviewing this.

On 18/3/22 11:07 am, Al Viro wrote:
> On Thu, Mar 17, 2022 at 06:26:11PM +1100, Imran Khan wrote:
>
>> diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
>> index 9d4103602554..cbdd1be5f0a8 100644
>> --- a/fs/kernfs/symlink.c
>> +++ b/fs/kernfs/symlink.c
>> @@ -113,12 +113,19 @@ static int kernfs_getlink(struct inode *inode, char *path)
>> struct kernfs_node *kn = inode->i_private;
>> struct kernfs_node *parent = kn->parent;
>> struct kernfs_node *target = kn->symlink.target_kn;
>> - struct rw_semaphore *rwsem;
>> + struct kernfs_rwsem_token token;
>> int error;
>>
>> - rwsem = kernfs_down_read(parent);
>> + /**
>> + * Lock both parent and target, to avoid their movement
>> + * or removal in the middle of path construction.
>> + * If a competing remove or rename for parent or target
>> + * wins, it will be reflected in result returned from
>> + * kernfs_get_target_path.
>> + */
>> + kernfs_down_read_double_nodes(target, parent, &token);
>> error = kernfs_get_target_path(parent, target, path);
>> - kernfs_up_read(rwsem);
>> + kernfs_up_read_double_nodes(target, parent, &token);
>>
>> return error;
>> }
>
> No. Read through the kernfs_get_target_path(). Why would locking these
> two specific nodes be sufficient for anything useful? That code relies
> upon ->parent of *many* nodes being stable. Which is not going to be
> guaranteed by anything of that sort.
>
> And it's not just "we might get garbage if we race" - it's "we might
> walk into kfree'd object and proceed to walk the pointer chain".
>
> Or have this loop
> kn = target;
> while (kn->parent && kn != base) {
> len += strlen(kn->name) + 1;
> kn = kn->parent;
> }
> see the names that are not identical to what we see in
> kn = target;
> while (kn->parent && kn != base) {
> int slen = strlen(kn->name);
>
> len -= slen;
> memcpy(s + len, kn->name, slen);
> if (len)
> s[--len] = '/';
>
> kn = kn->parent;
> }
> done later in the same function. With obvious unpleasant effects.
> Or a different set of nodes, for that matter.
>
> This code really depends upon the tree being stable. No renames of
> any sort allowed during that thing.

Yes. My earlier approach is wrong.

This patch set has also introduced a per-fs mutex (kernfs_rm_mutex)
which should fix the problem of inconsistent tree view as far as
kernfs_get_path is concerned.
Acquiring kernfs_rm_mutex before invoking kernfs_get_path in
kernfs_getlink will ensure that kernfs_get_path will get a consistent
view of ->parent of nodes from root to target. This is because acquiring
kernfs_rm_mutex will ensure that __kernfs_remove does not remove any
kernfs_node(or parent of kernfs_node). Further it ensures that
kernfs_rename_ns does not move any kernfs_node. So far I have not used
per-fs mutex in kernfs_rename_ns but I can make this change in next
version. So following change on top of current patch set should fix
this issue of ->parent change in the middle of kernfs_get_path.


diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 1b28d99ff1c3..8095dcdd437c 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1672,11 +1672,13 @@ int kernfs_rename_ns(struct kernfs_node *kn,
struct kernfs_node *new_parent,
const char *old_name = NULL;
struct kernfs_rwsem_token token;
int error;
+ struct kernfs_root *root = kernfs_root(kn);

/* can't move or rename root */
if (!kn->parent)
return -EINVAL;

+ mutex_lock(&root->kernfs_rm_mutex);
old_parent = kn->parent;
kernfs_get(old_parent);
kernfs_down_write_triple_nodes(kn, old_parent, new_parent, &token);
@@ -1741,6 +1743,7 @@ int kernfs_rename_ns(struct kernfs_node *kn,
struct kernfs_node *new_parent,
error = 0;
out:
kernfs_up_write_triple_nodes(kn, new_parent, old_parent, &token);
+ mutex_unlock(&root->kernfs_rm_mutex);
return error;
}

diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index cbdd1be5f0a8..805543d7a1f2 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -113,19 +113,22 @@ static int kernfs_getlink(struct inode *inode,
char *path)
struct kernfs_node *kn = inode->i_private;
struct kernfs_node *parent = kn->parent;
struct kernfs_node *target = kn->symlink.target_kn;
- struct kernfs_rwsem_token token;
+ struct kernfs_root *root;
int error;

+ root = kernfs_root(kn);
+
/**
- * Lock both parent and target, to avoid their movement
- * or removal in the middle of path construction.
- * If a competing remove or rename for parent or target
- * wins, it will be reflected in result returned from
- * kernfs_get_target_path.
+ * Acquire kernfs_rm_mutex to ensure that kernfs_get_path
+ * sees correct ->parent for all nodes.
+ * We need to make sure that during kernfs_get_path parent
+ * of any node from target to root does not change. Acquiring
+ * kernfs_rm_mutex ensure that there are no concurrent remove
+ * or rename operations.
*/
- kernfs_down_read_double_nodes(target, parent, &token);
+ mutex_lock(&root->kernfs_rm_mutex);
error = kernfs_get_target_path(parent, target, path);
- kernfs_up_read_double_nodes(target, parent, &token);
+ mutex_unlock(&root->kernfs_rm_mutex);

return error;
}

Could you please let me know if you see some issues with this approach ?

Thanks
-- Imran