Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

From: Linus Torvalds
Date: Thu May 14 2015 - 22:18:24 EST


On Thu, May 14, 2015 at 6:26 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> Hold on. Should
> stat("blah", &buf) => ENOENT, OK, let's create it
> mkdir("blah", 0) => EEXIST, bugger, looks like a race
> stat("blah", &buf) => ENOENT, Whiskey, Tango, Foxtrot
> be possible?

No. What I described would not in any way change any of the above. I'm
not understanding what your point is.

The only difference - EVER - would be if you pass in the ICASE flag.
Nothing I suggested would change semantics without it (the _hash_
changes, but that doesn't change semantics, it's a purely internal
random number).

Now, *with* O_ICASE/AT_ICASE, semantics change. Obviously. At that
point the dentry lookup would match case-insensitively.

For example, let's say that you have a directory where you already
have both "Blah" and "blah", because you created them in a sane
environment. They'll be two different dentries (assuming they are
cached), but they'll have the same dentry hash.

Now, you open "blah" with O_ICASE, and the end result is that you
would randomly open one or the other (it would be the one you find
first on the hash chain). Tough. Mixing icase and case-insensitive is
by definition going to cause those kinds of issues.

The nasty issue (and the case that samba apparently wants it for) is
that ICASE wouldn't be able to trust negative dentries (us having a
negative dentry in one case doesn't mean that it's negative in ICASE).
And that might be the killer part. Negative dentries are really
useful.

Now, the VFS layer support part is I think fairly simple. I might be
wrong, but I really think the hashing etc wouldn't be too painful.
After all, we already do support ->d_hash() and ->d_compare(), this is
"more of the same", just supported at a vfs level directly (and
_allowing_ aliases in case).

The real pain is that the low-level filesystem has to support it too.
That's simple for some filesystems, but it can be hard for things that
hash filenames. Because there - unlike at the VFS layer - the hashes
have meaning and you can't just change them to suit a ICASE lookup
(because they exist on-disk).

So supporting that is likely trivial on filesystems like FAT or SYSV,
which just iterate over the directory anyway at lookup() time. On ext*
with hashed directories, it's nasty (and a ICASE lookup would probably
have to just walk the whole directory. old-style). But I think all the
code to do the nonhashed lookup is still there, since it is a
filesystem feature bit. And it would only need to do that linear
search thing when the ICASE flag is set in the lookup flags.

Of course, if it ends up just walking the directory linearly anyway,
it doesn't fix the one samba performance problem that Jeremy pointed
out, so that makes this of dubious value. If we can't do this better
than samba can already do it on its own, it's kind of pointless.

Again - the filesystems (and the vfs layer) would remain case
sensitive. But I think it might be fairly straightforward to allow
per-operation ICASE handling for thins that want it.

Keyword "think". Maybe there's something I didn't think of.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/