Re: unicode (char as abstract data type)

Khimenko Victor (khim@sch57.msk.ru)
Sat, 18 Apr 1998 13:37:27 +0400 (MSD)


17-Apr-98 23:27 you wrote:
> Alex Belits writes:
> > On Fri, 17 Apr 1998, Albert D. Cahalan wrote:
>
> >> It is not dead. The Unicode support in the system allows for a
> >> future world without 8-bit apps. The transition may take a decade.
> >> When the transition is done, there won't be so much reencoding
> >> between apps and the kernel.
> >
> > In a decade Unicode most likely will be in the same place
> > where EBCDIC is now.
>
> That would be KOI-8, used only by Alex Belits.
>
> Look at it this way:
>
> We are stuck in a world with multiple character encodings.
> To convert, you generally need to go through UCS2.
> The kernel must convert for foreign filesystem support.
> The library & apps must convert for many other reasons.
> If libc can use UCS2 to call the kernel, then the kernel
> only needs to perform half of the conversion and libc won't
> need to convert back to UCS2. Put more of it in user-space!
>
> Think of a machine with several users and several filesystems.
> Maybe they are all Czech, which Martin Mares reports as having
> more than 5 character encodings.
Heh :-) Looks like Alex Belits joking -- Russian has 5 "widely used"
encodings plus 5-7 not so "widely used" ...

> Each user wants to see the system
> in their preferred encoding. Solution: the kernel reads filenames
> from disk in whatever format is there, then converts to UCS2.
I am really, really want this just now! Since mars_nwe & samba really
requires ibm866 (I am could not change DOS & Windows clients!) while
netatalk uses x-mac-cyrillyc (I am could not change Mac clients also).
Result: file written from Mac very often could not be readed from Windows
or DOS and vice versa. And since I have koi8-r on linux console I am could
not read both filenames :-(( This problem is ALREADY exists.. It's solved
now with hack of mars_nwe but this is ugly anyway...

> The library converts UCS2 into the format which each user wants.
>
I am prefer per-process basis, not per-user...

This will solve problem for *some* languages (Russian, for example) but not
for all. What if one encoding requires left-to-right writing and other
requires right-to-left writeing ? I am not joking -- this is really existing
encodings! So really Unicode is not solution for all :-(( This is solution
for Russian anyway...

> The yucky alternative: the conversion from UCS2 to _one_ local
> encoding is also in the kernel and users that don't like the chosen
> encoding are screwed: live with it or suffer a _second_ conversion.
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu