Re: UTF-8 practically vs. theoretically in the VFS API

From: Marc Lehmann
Date: Mon Feb 16 2004 - 15:17:46 EST


On Mon, Feb 16, 2004 at 07:48:19PM +0000, John Bradford <john@xxxxxxxxxxxx> wrote:
> Quote from Jeff Garzik <jgarzik@xxxxxxxxx>:
> None of this is a real problem, if everything is set up correctly and
> bug free. Unfortunately the Just Works thing falls apart in the,
> (frequent), instances that it's not :-(.

And this is the whole point.

BTW, to people trying to explain some properties of UTF-8 to me. I don't
think ad-hominem attacks like assuming that I don't understand UTF-8
(without any indication that this is so) are useful.

The point here is that the kernel does, in a very narrow interpretation,
not support the use of UTF-8, because proper support of UTF-8 means that
no illegal byte sequences will be produced.

Of course, I can feed the kernel UTF-8, and if everybody does that, it
will generally work quite fine. However, Windows surely works fine if
every program only feeds allowed values into system calls. And even unix
dialects without memory protection work, as long as everybody plays
fair.

The point is, however, that this is highly undesirable, and it would be
nice to have a kernel that would (optionally) fully support a UTF-8
environment in where applications can feed UTF-8 and _expect_ UTF-8 in
return, which _is_ a security issue.

It's very desirable to have a kernel that actively supports this. IT is
clearly not _required_, of course. But then again, process abstraction
is also not required...

--
-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / pcg@xxxxxxxx |e|
-=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
The choice of a GNU generation |
|
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/