Re: unicode (char as abstract data type)

Raul Miller (rdm@test.legislate.com)
Sat, 18 Apr 1998 23:41:55 -0400


Albert D. Cahalan <acahalan@cs.uml.edu> wrote:
> No, we already have this in the kernel _and_ a conversion back
> into some other encoding. We can't get rid of it because userspace
> is not generally aware of mount point crossings and mount options.

Er.. it shouldn't be all that hard to fix mount. I think the
real problem, though, is that this idea just plain hasn't been
thought through very well.

> It would be really bad if libc needed to know details of the
> filesystem just to read the directory.

This is nothing compared to the complexity involved in converting data
formats between syscalls such read and rename or between readlink and
write. (which seems to be the gist of your proposal)

Which probably means this shouldn't be done at all.

> Assuming directory reads (writes are reverse), the minimum needed in
> the kernel is a conversion _to_ UCS2 Unicode. Currently, we also
> convert back to an 8-bit encoding or to UTF-8. That step can be in
> libc instead, but the other half must remain in the kernel.

(1) This sort of "need" only exists if the underlying directory
implementation uses ucs2 for characters.
(2) It's fairly cheap to convert ucs2 -> utf8 and utf8 -> ucs2 for
these cases (leaving the kernel interface the way it is). Character
conversion for directory reads are NOT time critical.
(3) It doesn't make sense to use ucs2 for some kernel calls when all
other kernel calls involving character sets are designed around 8-bit
character sets.

Basically, it looks like you want to add significant complexity to
achieve an estimated 0.001% speed improvement on some small percentage
of machines.

Personally, I'm not going to say anything more in this thread unless
I see some really convincing reason for this approach.

-- 
Raul

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu