Re: UTF-8 practically vs. theoretically in the VFS API
From: jw schultz
Date: Tue Feb 17 2004 - 02:21:49 EST
On Mon, Feb 16, 2004 at 09:16:10PM +0100, Marc Lehmann wrote:
> On Mon, Feb 16, 2004 at 07:48:19PM +0000, John Bradford <john@xxxxxxxxxxxx> wrote:
> > Quote from Jeff Garzik <jgarzik@xxxxxxxxx>:
> > None of this is a real problem, if everything is set up correctly and
> > bug free. Unfortunately the Just Works thing falls apart in the,
> > (frequent), instances that it's not :-(.
>
> And this is the whole point.
>
> BTW, to people trying to explain some properties of UTF-8 to me. I don't
> think ad-hominem attacks like assuming that I don't understand UTF-8
> (without any indication that this is so) are useful.
>
> The point here is that the kernel does, in a very narrow interpretation,
> not support the use of UTF-8, because proper support of UTF-8 means that
> no illegal byte sequences will be produced.
That "interpretation" is so narrow as to be unrealistic.
The kernel supports UTF-8 the same way a stage supports
rock musicians. You confuse support with enforce, rather
like confusing tolerance with endorsement.
And it should be noted that the kernel doesn't produce file
names. It only passes them along.
> Of course, I can feed the kernel UTF-8, and if everybody does that, it
> will generally work quite fine. However, Windows surely works fine if
> every program only feeds allowed values into system calls. And even unix
> dialects without memory protection work, as long as everybody plays
> fair.
>
> The point is, however, that this is highly undesirable, and it would be
> nice to have a kernel that would (optionally) fully support a UTF-8
You mean enforce again. That enhancement request has been
rejected repeatedly because such a thing would be highly
undesirable. What might be a convenient but unnecessary
restriction today is too likely to become an unbearable
restriction tomorrow. I don't want the kernel to have to
care about what is or isn't valid UTF-8. I certainly don't
want to have the kernel loaded with outdated character
tables.
> environment in where applications can feed UTF-8 and _expect_ UTF-8 in
> return, which _is_ a security issue.
I want an environment where applications can feed bytestreams
and expect the same bytestream in return. I see enough
problems as a result of filesystems that don't do that.
> It's very desirable to have a kernel that actively supports this. IT is
You mean enforces again. Kernel as police, next thing you
will want is a kernel that prevents undesirable character
sequences.
> clearly not _required_, of course. But then again, process abstraction
> is also not required...
I'll tell you what. Patch libc. You can add UTF-8 filename
enforcement to libc. There are only a few system calls that
would need to have their wrappers enlarged. I'm sure the
libc people will direct you to someplace very warm if you
ask them for this enhancement.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw@xxxxxxxxxx
Remember Cernan and Schmitt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/