Re: UTF-8 practically vs. theoretically in the VFS API
From: Jeff Garzik
Date: Mon Feb 16 2004 - 14:30:10 EST
Linus Torvalds wrote:
In short: filenames are byte streams. Nothing more. They don't even have a
"character set". They literally are just a series of bytes.
And when I say that you have to talk to the kernel using UTF-8, I'm only
claiming that it is the only sane way to encode extended characters in a
byte stream. Nothing more.
Nod. Maybe it helps Marc to point out the key difference between
characters and bytes, in UTF8.
In UTF8, the number of characters in a string is less-than-or-equal-to
the number of bytes in the string.
And the kernel just cares about bytes.
This is the whole benefit to UTF8, right here in this thread. UTF8 was
designed such that ten-year-old C code using standard C strings would
function just fine. No need to rip up large swaths of your code just to
call multi-byte versions of the standard string functions. Most code
that doesn't deal with locale-specific details like uppercase/lowercase
Just Works(tm).
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/