Re: UTF-8 practically vs. theoretically in the VFS API

From: Marc Lehmann
Date: Mon Feb 16 2004 - 15:22:34 EST


On Mon, Feb 16, 2004 at 11:48:35AM -0800, Linus Torvalds <torvalds@xxxxxxxx> wrote:
> works on the raw byte sequence and isn't confused). Basically accept the
> fact that UTF-8 strings can contain "garbage", and don't try to fix it up.

But you are wrong, UTF-8 strings never contain garbage. UTF-8 is
well-defined and is always proper UTF-8. It's a tautology.

The evry idea of "UTF-8 with garbage in it" doesn't make sense.

> And no, I'm not claiming that it's wonderfully clean and that we should
> all love it.

It's also a totally useless idiom...

> And it largely works today.
> Linus

On ascii-only-systems, it works fine. My system is largely ascii-only,
with only very few filenames (japanese and german ones mostly) in
UTF-8. Sometimes in EUC-JP, but that's a bug in rar.

It also works fine in single-user environments where the user just forces
everything to be in her locale. It does fail miserably on multi-user
systems. It does fail miserably in ISO-C's locale model. It does fail
miserably with gnu shellutils, fileutils and most other apps.

It fails, because it's not at all well supported by the kernel.

Claiming that it largely works today is simply not true for most
non-ascii-users (which increasingly includes the US).

--
-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / pcg@xxxxxxxx |e|
-=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
The choice of a GNU generation |
|
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/