Re: [PATCH 15/20] ufs: remove the BKL

From: Nick Piggin
Date: Thu Jan 27 2011 - 00:48:01 EST


Really great work in removing BKL, Arnd. It's awesome
work and I'm sure a lot of it was pretty thankless along
the way.

On Wed, Jan 26, 2011 at 9:17 AM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> This introduces a new per-superblock mutex in UFS to replace
> the big kernel lock. I have been careful to avoid nested
> calls to lock_ufs and to get the lock order right with
> respect to other mutexes, in particular lock_super.

When I looked at removing bkl from minix a long time ago,
I was a bit worried about reclaim and fs/io recursion in some
of the filesystems with bkl.


> @@ -436,7 +439,8 @@ int ufs_getfrag_block(struct inode *inode, sector_t fragment, struct buffer_head
>        ret = 0;
>        bh = NULL;
>
> -       lock_kernel();
> +       if (needs_lock)
> +               lock_ufs(sb);
>
>        UFSD("ENTER, ino %lu, fragment %llu\n", inode->i_ino, (unsigned long long)fragment);
>        if (fragment >

[...]

> @@ -55,16 +54,16 @@ static struct dentry *ufs_lookup(struct inode * dir, struct dentry *dentry, stru
>        if (dentry->d_name.len > UFS_MAXNAMLEN)
>                return ERR_PTR(-ENAMETOOLONG);
>
> -       lock_kernel();
> +       lock_ufs(dir->i_sb);
>        ino = ufs_inode_by_name(dir, &dentry->d_name);
>        if (ino) {
>                inode = ufs_iget(dir->i_sb, ino);
>                if (IS_ERR(inode)) {
> -                       unlock_kernel();
> +                       unlock_ufs(dir->i_sb);
>                        return ERR_CAST(inode);
>                }
>        }
> -       unlock_kernel();
> +       unlock_ufs(dir->i_sb);
>        d_add(dentry, inode);
>        return NULL;
>  }

versus

1405static struct inode *ufs_alloc_inode(struct super_block *sb)
1406{
1407 struct ufs_inode_info *ei;
1408 ei = (struct ufs_inode_info
*)kmem_cache_alloc(ufs_inode_cachep, GFP_KERNEL);
1409 if (!ei)
1410 return NULL;
1411 ei->vfs_inode.i_version = 1;
1412 return &ei->vfs_inode;
1413}

So, get_block can be called for .writepage in page reclaim,
which takes the lock. ufs_lookup takes the lock and winds
up calling ufs_alloc_inode. And ufs_alloc_inode does
GFP_KERNEL, which can enter reclaim with __GFP_FS
set.

I didn't look through all your filesystem conversions, but it is
something tricky to watch out for I think.

Changing everything to GFP_NOFS may be an option, for
such crufty old filesystems...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/