Re: dcache questions

David C Niemi (niemi@tux.org)
Wed, 31 Dec 1997 10:39:48 -0500 (EST)


On Wed, 31 Dec 1997, Martin von Loewis wrote:
[I wrote]
> > This really can't be reliably done via name-mangling, because the mangling
> > is not only fairly complex but also nondeterministic (the ~1 can be a ~2
> > depending on what else is in the directory, and when you get past ~9 you
> > have to subtract another letter from the name fragment at the front).
>
> I think it can be done reliably. Of course, you not only have to look
> at the names themselves, but you also have to keep the association between
> the names around, which you've learned when you first read the directory.

You could remember the associations explicitly based on having read the
directory, but that has nothing to do with shortname prediction and it is
only practical to do that for paths which actually have been used, or
otherwise you'd flood the dcache with lots of hashes that no-one is looking
at. So it really comes down to how you handle the names in the normal
dentry lookup process.

---

Let me put the idea of deriving shortnames from longnames to rest, in case anyone else still believes in it.

Shortname prediction is not reliable because it is nondeterministic even if you know the short and long names of all files in the directory. Think of a case where there was already a file with a shortname of "foofoo~1" and a file called "foofoofoo" (shortname "foofoo~2") was created, then later the file with the shortname of "foofoo~1" is deleted. There is no way to know that "foofoofoo" should have the shortname of "foofoo~2" unless you remember the entire past history of the directory. So thinking you can predict the shortname based on the longname leads to a 95% reliable and 5% horribly wrong system.

Keep in mind also that many characters which are normal in Unix, or double dots, or 4-character extensions like .html, all result in mangled names, so the mangled names can be extremely numerous and it is easy to end up with many mangled names in the same directory that can only be distinguished by their ~<number> parts.

Finally, the algorithm for name-mangling is much too tedious to do on the fly during a lookup, potentially on each path component!

So we are stuck with two loosely correlated names, and this does indeed result in horrible complications and horrible performance, especially on file creation and renames, but we have no choice if we want to support VFAT. Complain to Bill Gates if you don't like it.

> In the Linux dcache case, you'd have to cache the long name, and the > associated inode. In the inode, remember the short name which you > found when reading the directory. Wouldn't that work?

The inode has no way of knowing what path it was reached with, only the dentry has that info. And the dentry only has one d_name, so the only way to remember a noncanonical path is if there is a separate dentry for it, which I think is what Gordon was complaining about being too messy. At most, that means you remember the hash of the part-shortname path but not the actual part-shortname path itself. For that matter, what would you ever need it for? The lookups are pretty much all one-way, and for the rare reverse lookup you can give back the canonical (all-longname) path.

David Niemi@tux.org 703-810-5538 Reston, Virginia, USA "Down that path lies madness. On the other hand, the road to hell is paved with melting snowballs." -- Larry Wall, 1992