Re: UTF-8 practically vs. theoretically in the VFS API

From: Jeff Garzik
Date: Mon Feb 16 2004 - 14:30:10 EST


Linus Torvalds wrote:
In short: filenames are byte streams. Nothing more. They don't even have a "character set". They literally are just a series of bytes.

And when I say that you have to talk to the kernel using UTF-8, I'm only claiming that it is the only sane way to encode extended characters in a byte stream. Nothing more.


Nod. Maybe it helps Marc to point out the key difference between characters and bytes, in UTF8.

In UTF8, the number of characters in a string is less-than-or-equal-to the number of bytes in the string.

And the kernel just cares about bytes.

This is the whole benefit to UTF8, right here in this thread. UTF8 was designed such that ten-year-old C code using standard C strings would function just fine. No need to rip up large swaths of your code just to call multi-byte versions of the standard string functions. Most code that doesn't deal with locale-specific details like uppercase/lowercase Just Works(tm).

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/