Re: unicode

Kenneth Albanowski (kjahds@kjahds.com)
Thu, 14 May 1998 22:11:00 -0400 (EDT)


On Thu, 14 May 1998, Theodore Y. Ts'o wrote:

> None of the above (capitalization, hyphenation, and phoentic match) are
> required for filenames. They are required if you are using a word
> processor (such as Microsoft Office's Word, which is also using Unicode
> internally to store all of their documents, so they've managed to solve
> this problem), but that's not we're talking about here on the
> linux-kernel mailing list.

Apart from the rest of this discussion, two closely related issues _can_
be important for filenames: case insensitivity (and its analogs in other
languages), and sorting order. I'm fairly sure that these are beyond the
scope of Unicode (given the complexities of multi-character case and
sorting rules in the mere few European languages I'm familiar with).

Sorting order clearly does not need to be in the kernel or filesystem,
while case insensitivity is less clear. If not in the kernel, it still
needs to be ubiquitous, so some libc facility seems necessary. Note that a
canonical "lower-case" storage format is probably not feasible in all
languages.

'Til now, I've been of the opinion that case-insensitivity had no part in
Linux, but I'm coming round to the opinion that it will be necessary as
part of a "user-friendly" UI.

> For filenames, it is really, really bad when a user sees two filenames
> in a directory listing which look identical when printed on the screen,
> but which have different encodings. It is also really bad when the user
> sees a particular filename in a directory listing, tries to type it, but
> because the user was unlucky and guessed wrong about which character set
> was used, she gets a "file not found error".

Equally, two separate filenames identical but for case are disallowed in
case-insensitive filesystems. (There is a rather interesting slippery
slope here: should file names identical but for runs of non-printing
glyphs be considered the same? (think of " " vs " ".) I suspect they
should. What about non-printing _and_ -spacing glyphs?)

I suppose the important question is whether it is feasible, from a coding
reliability standpoint, to make the case-insensitive interface a mere
library that a program is free to use or ignore.

-- 
Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu