UTF-8 and case-insensitivity

From: tridge
Date: Mon Feb 16 2004 - 23:15:41 EST


Given how much pain the "kernel is agnostic to charset encoding"
attitude has cost me in terms of programming pain, I thought I should
de-cloak from lurk mode and put my 2c into the UTF-8 issue.

Personally I think that eventually the Linux kernel will have to
embrace the interpretation of the byte streams that applications have
given it, despite the fact that this will be very painful and
potentially quite complex. The reason is that I think that eventually
the Linux kernel will need to efficiently support a userspace policy
of case-insensitivity and the only way to do case-insensitive filename
operations is to interpret those byte streams as a particular
encoding.

Personally I much prefer the systems I use to be case-sensitive, but
there are important applications that require case-insensitivity for
interoperability. Right now it is not possible to write a case
insensitive application on Linux in an efficient manner. With the
current "encoding agnostic" APIs a simple open() or stat() call
becomes a horrendously expensive operation and one that is fraught
with race conditions. Providing the same functionality in the kernel
is dirt cheap by comparison (not cheap in terms of code complexity,
but cheap in terms of runtime efficiency).

Cheers, Tridge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/