Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

Alex Belits (abelits@phobos.illtel.denver.co.us)
Wed, 20 Aug 1997 01:45:41 -0700 (PDT)


On Wed, 20 Aug 1997, Erik Corry wrote:

> Unicode is regularly extended, and is incredibly complete in

...by a commitee. And they don't release free implementation of
it or updates to existing ones after that.

> the area of Japanese characters (which I know a little
> about). I am sure it is/could be as complete for Chinese.
> There are even private areas for your own extensions. What
> more could you want?

Japanese and Chinese characters encoding that Japanese and Chinese people
use, perhaps?

[skipped]

> Linux has already standardised on UTF-8 for the console.

(looking at the console...) No, still looks like koi8-r for me... Having
the internal support doesn't mean that it's usable enough to make it
mandatory everywhere.

> The
> suggestion of converting all file systems to a single
> encoding is probably a useful one, and should probably
> available as a (default?) mount option.

It should be possible to _choose_ mapping as the mount option, not
"UTF-8 or all filenames will be truncated to the first letter because
second one is zero".

> > This is _JUST_ for 16bit+ filesystems. Not 8bit filesystems like ext2 or
> > the like.
>
> As far as I understood, ext2 had already been standardised on UTF-8.
> Of course most (all?) of the effects of this decision take place
> in user space, so people are free to do something else if they really
> want to. Especially people who prefer an 8-bit standard (no '/' issue).

I'm not aware of any development of Unicode-using tools. And unless
sh / bash / grep / awk /... will work with UTF-8 as with native characters
(that means, variable-length-encoded character is treated as one
character, and what I don't think, anyone will make any soon), no one will
use it for anything decent. If I wanted to use a sysatem with cryptic text
processing in library and no text utilities for scripts, I'd used NT.

> Even without a big discussion on linux-kernel. Hint, hint.

There was big discussion in FTP-WG ML already -- even though two people
who were the only non-iso8859-1-native ones, disagreed, look at their
"wonderful" i18n draft with "UTF-8, or we will guess the encoding by the
content, and if yours one happens to look like valid UTF-8, you lose!"
attitude.

--
Alex