Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

From: Linus Torvalds
Date: Fri May 15 2015 - 21:47:21 EST

On Fri, May 15, 2015 at 6:25 PM, NeilBrown <neilb@xxxxxxx> wrote:
>> For example, simply that we only ever have one single dentry for a
>> particular name, and that we only ever have one active lookup per
>> dentry. Those things happen independently of - and before - the server
>> even sees the operation.
> But surely those things can be managed with a spinlock.

Some of them could. But no, not in general. Exactly because of that
"only one lookup of this particular dentry". The lookup sleeps, so
when another thread tries to look up the same dentry, it needs to hit
a sleeping lock. And since at that point we don't have the new dentry
yet (much less the inode we're looking up), it is in the only place we
*do* have: the inode of the directory we're looking up.

Now, maybe we could solve it with a new sleeping lock in the dentry
itself. Maybe we could allocate the new dentry early, add it to the
directory the usual way, but mark it as being "not ready" (so that
d_lookup() wouldn't use it). And have the sleeping lock be a new
sleeping lock in the dentry.

But that would be a very big change. And I suspect Al has tons of
reasons why it would have various problems. We do rely on the i_mutex
a fair amount..

> I think a big part of the problem is that the VFS tries to control
> filesystems rather than provide services to them.

Well, most of the time we do aim to provice services (eg the page
cache generally works that way).

But the dentry layer really is pretty important. And it's a *huge*
performance win. Yes, yes, you don't see it when it works, you only
see the bad cases, and there is no way in hell we can let the
low-level filesystem control the dentry tree.

Also, remember: we do have something like 65 different filesystems.
It's a *big* deal to make them all work "well enough". Making the
locking rules unambiguous is pretty primary.

Making one particular filesystem perform well is important to _you_,
but in the big picture...

>> - readdir(). This is mostly to make it hard for filesystems to do the
>> wrong thing when there is concurrent file creation.
> and makes it impossible for them to do the right thing too?

Well, possibly. As mentioned, the readdir case might actually be
pretty trivial, but it just hasn't come up often.

I've literally only ever seen it for samba. I'd take a tested patch. Hint, hint.

>> Again, there tend to be no simple benchmarks or loads that people care
>> about that show this. Most of the time it's fairly hard to see.
> Which "this"? Readdir? I haven't come across one (NFS READDIR issues are
> all about when to use READDIR_PLUS and when not to).
> Or create-contention-on-i_mutex? The load my customer had was 'make -j60' on
> a many-cpu machine.

Both/either. The problem is that the "make -j60" case seems to depend
on the filesystem having bad latency - it certainly doesn't show up on
a local filesystem. Maybe that could be simulated some way..

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at