Re: [PATCH] mm/zswap: use workqueue to destroy pool
From: Dan Streetman
Date: Thu Apr 28 2016 - 04:22:42 EST
On Wed, Apr 27, 2016 at 9:40 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@xxxxxxxxx> wrote:
> Hello Dan,
>
> On (04/27/16 13:19), Dan Streetman wrote:
> [..]
>> > so in general the patch look good to me.
>> >
>> > it's either I didn't have enough coffee yet (which is true) or
>> > _IN THEORY_ it creates a tiny race condition; which is hard (and
>> > unlikely) to hit, but still. and the problem being is
>> > CONFIG_ZSMALLOC_STAT.
>>
>> Aha, thanks, I hadn't tested with that param enabled. However, the
>> patch doesn't create the race condition, that existed already.
>
> well, agree. it's not like zsmalloc race condition, but the way zsmalloc
> is managed (deferred destruction either via rcu or scheduled work).
>
>> It fails because the new zswap pool creates a new zpool using
>> zsmalloc, but it can't create the zsmalloc pool because there is
>> already one named 'zswap' so the stat dir can't be created.
>>
>> So...either zswap needs to provide a unique 'name' to each of its
>> zpools, or zsmalloc needs to modify its provided pool name in some way
>> (add a unique suffix maybe). Or both.
>>
>> It seems like zsmalloc should do the checking/modification - or, at
>> the very least, it should have consistent behavior regardless of the
>> CONFIG_ZSMALLOC_STAT setting.
>
> yes, zram guarantees that there won't be any name collisions. and the
> way it's working for zram, zram<ID> corresponds to zsmalloc<ID>.
>
>
> the bigger issue here (and I was thinking at some point of fixing it,
> but then I grepped to see how many API users are in there, and I gave
> up) is that it seems we have no way to check if the dir exists in debugfs.
>
> we call this function
>
> struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
> {
> struct dentry *dentry = start_creating(name, parent);
> struct inode *inode;
>
> if (IS_ERR(dentry))
> return NULL;
>
> inode = debugfs_get_inode(dentry->d_sb);
> if (unlikely(!inode))
> return failed_creating(dentry);
>
> inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
> inode->i_op = &simple_dir_inode_operations;
> inode->i_fop = &simple_dir_operations;
>
> /* directory inodes start off with i_nlink == 2 (for "." entry) */
> inc_nlink(inode);
> d_instantiate(dentry, inode);
> inc_nlink(d_inode(dentry->d_parent));
> fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
> return end_creating(dentry);
> }
>
> and debugfs _does know_ that the directory ERR_PTR(-EEXIST), that's what
> start_creating()->lookup_one_len() return
>
> static struct dentry *start_creating(const char *name, struct dentry *parent)
> {
> struct dentry *dentry;
> int error;
>
> pr_debug("debugfs: creating file '%s'\n",name);
>
> if (IS_ERR(parent))
> return parent;
>
> error = simple_pin_fs(&debug_fs_type, &debugfs_mount,
> &debugfs_mount_count);
> if (error)
> return ERR_PTR(error);
>
> /* If the parent is not specified, we create it in the root.
> * We need the root dentry to do this, which is in the super
> * block. A pointer to that is in the struct vfsmount that we
> * have around.
> */
> if (!parent)
> parent = debugfs_mount->mnt_root;
>
> inode_lock(d_inode(parent));
> dentry = lookup_one_len(name, parent, strlen(name));
> if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
> dput(dentry);
> dentry = ERR_PTR(-EEXIST);
> }
>
> if (IS_ERR(dentry)) {
> inode_unlock(d_inode(parent));
> simple_release_fs(&debugfs_mount, &debugfs_mount_count);
> }
>
> return dentry;
> }
>
> but debugfs_create_dir() instead of propagating this error, it swallows it
> and simply return NULL, so we can't tell the difference between -EEXIST, OOM,
> or anything else. so doing this check in zsmalloc() is not so easy.
yeah, Greg intentionally made the debugfs api opaque, so there's only
a binary created/failed indication.
While I agree zswap should provide unique names, I also think zsmalloc
should not abort if its debugfs content fails to be created - the
intention of debugfs is not to be a critical part of drivers, but only
to provide debug information. I'll send a patch to zsmalloc
separately, that allows zsmalloc pool creation to continue even if the
debugfs dir/file failed to be created.
>
> /* well, I may be wrong here */
>
>> However, it's easy to change zswap to provide a unique name for each
>> zpool creation, and zsmalloc's primary user (zram) guarantees to
>> provide a unique name for each pool created. So updating zswap is
>> probably best.
>
> if you can do it in zswap, then please do.
>
> -ss