Re: UTF-8 and case-insensitivity

From: Theodore Ts'o
Date: Thu Feb 19 2004 - 09:10:41 EST


On Thu, Feb 19, 2004 at 02:20:44PM +1100, tridge@xxxxxxxxx wrote:
> Currently dnotify doesn't give you the filename that is being
> added/deleted/renamed. It just tells you that something has happened,
> but not enough to actually maintain a name cache in user space.
>
> That could be changed, so that on a dnotify event you do a fcntl() to
> ask for the name of the file. Or perhaps we could cram it into the
> structure the signal handler gets passed? I doubt that would make
> sense, but maybe some signal guru can tell me otherwise. Maybe we
> could even invent a new dnotify system where you do a read on a file
> descriptor to get details on what event happened, and give some
> "everything has changed" error when you run out of buffers.

Yes, that's what I was suggesting. One advantage of such a scheme is
that it's not just for Windows compatibility. A more rich directory
change notification scheme would also be useful for graphical file
managers, automatic indexing tools, and many, many other applications.

No, it's not everything you were requesting, but it may very well
represent three-quarters of a loaf, instead of nothing.

> If that happened then we could build our own dcache in user space, but
> it will be a very second rate dcache, with a racy and slow update
> mechanism that will in itself chew cpu. Maybe thats the best we can
> do, or maybe I should be asking distro vendors if they would consider
> a case-insensitive patch, especially the vendors aiming for
> "enterprise" scalability which might include serving windows clients.

I don't know that the update mechanism has to seriously chew that much
CPU. It can certainly can be designed to minimize the amount of CPU
that is consumed, especially if it is read via a file descriptor so
that multiple updates can be sent via a single read() system call,
instead of sending a signal every single time a directory entry is
created, renamed, or deleted.

The problem with a case-insentive patch is that for most modern
filesystems (i.e., any filesystem that does better than O(1) directory
searches), it will have to involve a format change, since the case
insensitivity has to be built into the hash function or the tree
comparison fucture, or both. At this point, the filesystem author has
to make the choice of whether to try to solve the Windows-specific
problem, in which case the fundamental filesystem format would have to
be tailored to the Windows case mapping table, or try to solve the
more general I18N case mapping problem. (Lots of luck! It's
constantly changing over time as new character sets are added or
modified...) Yes, a few such filesystems might have this support
already, but I doubt distributions would be willing to accept patches
that make filesystem format-incompatible changes just for the sake of
accelerating Samba operations.

I don't know if the distributions would be willing to accept a
case-insensitive patch, but my suspicions is that it would be
difficult, and I would argue that it might be more efficient to get a
richer directory change notification system, for the reasons I argued
above.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/