Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement
From: Ian Kent
Date: Wed Dec 16 2020 - 23:47:26 EST
On Tue, 2020-12-15 at 20:59 +0800, Ian Kent wrote:
> On Tue, 2020-12-15 at 16:33 +0800, Fox Chen wrote:
> > On Mon, Dec 14, 2020 at 9:30 PM Ian Kent <raven@xxxxxxxxxx> wrote:
> > > On Mon, 2020-12-14 at 14:14 +0800, Fox Chen wrote:
> > > > On Sun, Dec 13, 2020 at 11:46 AM Ian Kent <raven@xxxxxxxxxx>
> > > > wrote:
> > > > > On Fri, 2020-12-11 at 10:17 +0800, Ian Kent wrote:
> > > > > > On Fri, 2020-12-11 at 10:01 +0800, Ian Kent wrote:
> > > > > > > > For the patches, there is a mutex_lock in kn-
> > > > > > > > >attr_mutex,
> > > > > > > > as
> > > > > > > > Tejun
> > > > > > > > mentioned here
> > > > > > > > (
> > > > > > > > https://lore.kernel.org/lkml/X8fe0cmu+aq1gi7O@xxxxxxxxxxxxxxx/
> > > > > > > > ),
> > > > > > > > maybe a global
> > > > > > > > rwsem for kn->iattr will be better??
> > > > > > >
> > > > > > > I wasn't sure about that, IIRC a spin lock could be used
> > > > > > > around
> > > > > > > the
> > > > > > > initial check and checked again at the end which would
> > > > > > > probably
> > > > > > > have
> > > > > > > been much faster but much less conservative and a bit
> > > > > > > more
> > > > > > > ugly
> > > > > > > so
> > > > > > > I just went the conservative path since there was so much
> > > > > > > change
> > > > > > > already.
> > > > > >
> > > > > > Sorry, I hadn't looked at Tejun's reply yet and TBH didn't
> > > > > > remember
> > > > > > it.
> > > > > >
> > > > > > Based on what Tejun said it sounds like that needs work.
> > > > >
> > > > > Those attribute handling patches were meant to allow taking
> > > > > the
> > > > > rw
> > > > > sem read lock instead of the write lock for
> > > > > kernfs_refresh_inode()
> > > > > updates, with the added locking to protect the inode
> > > > > attributes
> > > > > update since it's called from the VFS both with and without
> > > > > the
> > > > > inode lock.
> > > >
> > > > Oh, understood. I was asking also because lock on kn-
> > > > >attr_mutex
> > > > drags
> > > > concurrent performance.
> > > >
> > > > > Looking around it looks like kernfs_iattrs() is called from
> > > > > multiple
> > > > > places without a node database lock at all.
> > > > >
> > > > > I'm thinking that, to keep my proposed change straight
> > > > > forward
> > > > > and on topic, I should just leave kernfs_refresh_inode()
> > > > > taking
> > > > > the node db write lock for now and consider the attributes
> > > > > handling
> > > > > as a separate change. Once that's done we could reconsider
> > > > > what's
> > > > > needed to use the node db read lock in
> > > > > kernfs_refresh_inode().
> > > >
> > > > You meant taking write lock of kernfs_rwsem for
> > > > kernfs_refresh_inode()??
> > > > It may be a lot slower in my benchmark, let me test it.
> > >
> > > Yes, but make sure the write lock of kernfs_rwsem is being taken
> > > not the read lock.
> > >
> > > That's a mistake I had initially?
> > >
> > > Still, that attributes handling is, I think, sufficient to
> > > warrant
> > > a separate change since it looks like it might need work, the
> > > kernfs
> > > node db probably should be kept stable for those attribute
> > > updates
> > > but equally the existence of an instantiated dentry might
> > > mitigate
> > > the it.
> > >
> > > Some people might just know whether it's ok or not but I would
> > > like
> > > to check the callers to work out what's going on.
> > >
> > > In any case it's academic if GCH isn't willing to consider the
> > > series
> > > for review and possible merge.
> > >
> > Hi Ian
> >
> > I removed kn->attr_mutex and changed read lock to write lock for
> > kernfs_refresh_inode
> >
> > down_write(&kernfs_rwsem);
> > kernfs_refresh_inode(kn, inode);
> > up_write(&kernfs_rwsem);
> >
> >
> > Unfortunate, changes in this way make things worse, my benchmark
> > runs
> > 100% slower than upstream sysfs. :(
> > open+read+close a sysfs file concurrently took 1000us. (Currently,
> > sysfs with a big mutex kernfs_mutex only takes ~500us
> > for one open+read+close operation concurrently)
>
> Right, so it does need attention nowish.
>
> I'll have a look at it in a while, I really need to get a new autofs
> release out, and there are quite a few changes, and testing is seeing
> a number of errors, some old, some newly introduced. It's proving
> difficult.
I've taken a breather for the autofs testing and had a look at this.
I think my original analysis of this was wrong.
Could you try this patch please.
I'm not sure how much difference it will make but, in principle,
it's much the same as the previous approach except it doesn't
increase the kernfs node struct size or mess with the other
attribute handling code.
Note, this is not even compile tested.
kernfs: use kernfs read lock in .getattr() and .permission()
From: Ian Kent <raven@xxxxxxxxxx>
>From Documenation/filesystems.rst and (slightly outdated) comments
in fs/attr.c the inode i_rwsem is used for attribute handling.
This lock satisfies the requirememnts needed to reduce lock contention,
namely a per-object lock needs to be used rather than a file system
global lock with the kernfs node db held stable for read operations.
In particular it should reduce lock contention seen when calling the
kernfs .permission() method.
The inode methods .getattr() and .permission() do not hold the inode
i_rwsem lock when called as they are usually read operations. Also
the .permission() method checks for rcu-walk mode and returns -ECHILD
to the VFS if it is set. So the i_rwsem lock can be used in
kernfs_iop_getattr() and kernfs_iop_permission() to protect the inode
update done by kernfs_refresh_inode(). Using this lock allows the
kernfs node db write lock in these functions to be changed to a read
lock.
Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
---
fs/kernfs/inode.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index ddaf18198935..568037e9efe9 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -189,9 +189,11 @@ int kernfs_iop_getattr(const struct path *path, struct kstat *stat,
struct inode *inode = d_inode(path->dentry);
struct kernfs_node *kn = inode->i_private;
- down_write(&kernfs_rwsem);
+ inode_lock(inode);
+ down_read(&kernfs_rwsem);
kernfs_refresh_inode(kn, inode);
- up_write(&kernfs_rwsem);
+ up_read(&kernfs_rwsem);
+ inode_unlock(inode);
generic_fillattr(inode, stat);
return 0;
@@ -281,9 +283,11 @@ int kernfs_iop_permission(struct inode *inode, int mask)
kn = inode->i_private;
- down_write(&kernfs_rwsem);
+ inode_lock(inode);
+ down_read(&kernfs_rwsem);
kernfs_refresh_inode(kn, inode);
- up_write(&kernfs_rwsem);
+ up_read(&kernfs_rwsem);
+ inode_unlock(inode);
return generic_permission(inode, mask);
}