> Unicode is regularly extended, and is incredibly complete in
...by a commitee. And they don't release free implementation of
it or updates to existing ones after that.
> the area of Japanese characters (which I know a little
> about). I am sure it is/could be as complete for Chinese.
> There are even private areas for your own extensions. What
> more could you want?
Japanese and Chinese characters encoding that Japanese and Chinese people
use, perhaps?
[skipped]
> Linux has already standardised on UTF-8 for the console.
(looking at the console...) No, still looks like koi8-r for me... Having
the internal support doesn't mean that it's usable enough to make it
mandatory everywhere.
> The
> suggestion of converting all file systems to a single
> encoding is probably a useful one, and should probably
> available as a (default?) mount option.
It should be possible to _choose_ mapping as the mount option, not
"UTF-8 or all filenames will be truncated to the first letter because
second one is zero".
> > This is _JUST_ for 16bit+ filesystems. Not 8bit filesystems like ext2 or
> > the like.
>
> As far as I understood, ext2 had already been standardised on UTF-8.
> Of course most (all?) of the effects of this decision take place
> in user space, so people are free to do something else if they really
> want to. Especially people who prefer an 8-bit standard (no '/' issue).
I'm not aware of any development of Unicode-using tools. And unless
sh / bash / grep / awk /... will work with UTF-8 as with native characters
(that means, variable-length-encoded character is treated as one
character, and what I don't think, anyone will make any soon), no one will
use it for anything decent. If I wanted to use a sysatem with cryptic text
processing in library and no text utilities for scripts, I'd used NT.
> Even without a big discussion on linux-kernel. Hint, hint.
There was big discussion in FTP-WG ML already -- even though two people
who were the only non-iso8859-1-native ones, disagreed, look at their
"wonderful" i18n draft with "UTF-8, or we will guess the encoding by the
content, and if yours one happens to look like valid UTF-8, you lose!"
attitude.
-- Alex