Re: unicode (char as abstract data type)

Khimenko Victor (khim@sch57.msk.ru)
Sun, 19 Apr 1998 13:09:21 +0400 (MSD)


18-Apr-98 23:41 you wrote:
> > Assuming directory reads (writes are reverse), the minimum needed in
> > the kernel is a conversion _to_ UCS2 Unicode. Currently, we also
> > convert back to an 8-bit encoding or to UTF-8. That step can be in
> > libc instead, but the other half must remain in the kernel.
>
> (1) This sort of "need" only exists if the underlying directory
> implementation uses ucs2 for characters.
> (2) It's fairly cheap to convert ucs2 -> utf8 and utf8 -> ucs2 for
> these cases (leaving the kernel interface the way it is). Character
> conversion for directory reads are NOT time critical.
> (3) It doesn't make sense to use ucs2 for some kernel calls when all
> other kernel calls involving character sets are designed around 8-bit
> character sets.
>
> Basically, it looks like you want to add significant complexity to
> achieve an estimated 0.001% speed improvement on some small percentage
> of machines.
>
No. You are wrong. For example name "AA" in unicode will be or
"00 65 00 65 00 00" or "65 00 66 00 00 00" (indiannes :-). Both cases
could not be handled well by standard 8bit routines :-(( This way you must
have conversion ucs2 -> utf8 in kernel and ut8f -> ucs2 in userland. This
is could be acceptable solution but this looks UGLY.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu