Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks
From: Linus Torvalds
Date: Fri May 15 2015 - 20:46:13 EST
On Fri, May 15, 2015 at 4:30 PM, NeilBrown <neilb@xxxxxxx> wrote:
>
> .. and I've been wondering what to do about i_mutex and NFS. I've had
> customer reports of slowness in creating files that seems to be due to
> i_mutex on the directory being held over the whole 'create' RPC, so only one
> of those can be in flight at the one time.
> "make -j" on a large source directory can easily want to create lots of
> "*.o" files at "the same time".
>
> And NFS doesn't need i_mutex at all because the server will provide the
> needed guarantees.
So i_mutex on a directory is probably the nastiest lock we have in the fs layer.
It's used for several different half-related things:
- serialize filename creation/deletion
This is partly for the benefit of the filesystem itself (and not
helpful for NFS, as you note), but it's also very much about making
sure we have uniqueness guarantees at the VFS layer too.
So even with NFS, it's not just "the server provides the needed
guarantees", because some of the guarantees are really client-local.
For example, simply that we only ever have one single dentry for a
particular name, and that we only ever have one active lookup per
dentry. Those things happen independently of - and before - the server
even sees the operation.
So the whole local directory tree consistency ends up depending on this.
- readdir(). This is mostly to make it hard for filesystems to do the
wrong thing when there is concurrent file creation.
I suspect readdir could fairly easily push the i_mutex down from the
caller and into the filesystem, and then filesystems might narrow down
the use (or even get rid of it). The initial patch might even be
automated with coccinelle. However, rather few loads actually have a
lot of readdir() activity, and samba is probably the only major one.
I've seen benchmarks where it matters, but they are rare (and I
haven't seen one in literally years).
So the readdir case could probably be at least relaxed fairly easily.
But the thing that tends to hurt on more loads is, as you note, the
filename lookup/creation/movement case. And that's much harder to fix.
Al, do you have any ideas? Personally, I've wanted to make I_mutex a
rwsem for a long time, but right now pretty much everything uses it
for exclusion. For example, filename lookup is clearly just reading
the directory, so it should take a rwsem for reading, right? No. Not
the way it is done now. Filename lookup wants the directory inode
exclusively because that guarantees that we create just one dentry and
call the filesystem ->lookup only once on that dentry.
Again, there tend to be no simple benchmarks or loads that people care
about that show this. Most of the time it's fairly hard to see.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/