Re: Eureka! (was Re: UTF-8 and case-insensitivity)

From: tridge
Date: Thu Feb 19 2004 - 18:40:28 EST


Linus,

I'm probably just thicker than a complete set of superman comics, and
probably haven't had enough coffee this morning, but I'm still trying
to understand exactly how much this is going to gain us.

If I understand it, your suggestion gives us:

- a way of telling if a directory is fully cached in the dcache
- a way of scanning that full cache with whatever braindead
comparison algorithm we want

At first I didn't understand the scanning part at all, because I
didn't realise that you could scan just the dentries associated with a
single directory. Al was kind enough to correct me on that.

What your proposal doesn't give us is case-insensitive indexing into
the dcache. The reason the dcache is such a great thing in Linux is
that it is indexed by name, so you rarely do any scanning at all, and
even the case where you have never seen the name before we avoid
scanning because fast filesystems also use a "indexed by name"
scheme. Now maybe I'm just over-obsessed about this scanning stuff and
I'd need some profiles to see how much it would cost (although the
cost as the directory size gets really large seems obvious).

The really interesting part of your proposal is that it opens up the
possibility of a coherence mechanism between a cache that is indexed
by some windows like scheme and the real dcache. If those two bits
could be used by the windows_braindead module to determine if its own
separately indexed cache was current then we'd really be getting
somewhere.

If we didn't do the separate cache at all, then your proposal still
should hugely reduce the number of times we ask the filesystem for a
list of files in the directory, although as those calls are already
cached at the block device level what I suppose it does is move the
cache up a level. I don't have a clear idea of how much faster it is
to do this scanning in the dcache versus in the filesystem in the
hot-cache case, so I am not clear on how much this wins us. I'm
prepared to believe it could be quite significant though.

I really need more coffee-and-think time on this, plus maybe some
quick and dirty profiling tests to see what the various costs are
like.

While I'm here I should point out that I'm thinking of the 2.7/2.8
kernel (or even 3.0) for any change, not 2.6. Maybe thats obvious
anyway, but the corresponding userspace changes in Samba definately
won't be happening in Samba 3.0, so this is a Samba 4.0 thing, which
is a fair way off. This means we've got plenty of time to try some
experiments and see what schemes really help.

Cheers, Tridge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/