Re: Unicode console patch

Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Wed, 3 Feb 1999 23:14:18 +0100 (CET)


On Wed, 3 Feb 1999 edmund@rano.demon.co.uk wrote:

> It might be useful for the program to be able to ask whether a
> particular character can be displayed or not. (An editor might then
> show the octal code instead.)

Right. E.g. lynx paradoxically can display better when set to ISO-8859-2
than when set to UTF-8 with font containing only ISO-8859-2 characters,
because then it knows which characters to substitute with multicharacter
sequences (single-character substitutions can be put into screen map;
console-tools provide a nice way for setting such fallback tables).

> If we want to do Korean it might be useful to be able to ask how big a
> particular character is when displayed. (Korean consoles tend to use
> double-width characters.)

Right. I've done a terminal filter which dynamically changes font to
contain the most recently used characters and also it combines accents
with previous characters as Unicode says (I think). There is a problem
that no program will have a chance of displaying properly with combining
accents, because it can't check whether the terminal supports them and
what characters it thinks are combining accents.

Current version can be found at http://kki.net.pl/qrczak/programy/linux/
{fonty,konwert}/ (as part of larger packages; try `unicode_start;
filterm - dynafont'), but they need much reworking, improving, polishing,
writing docs, making fonts etc. For emulation of non-UTF8 modes I have to
be able to check the exact translation state.

Maybe those queries can be expressed as measuring the character widths,
thus telling that combining accents have zero width. Nonexistent and
unknown characters should be marked separately. Some characters may be
known to be accents, but missing in the loaded font (and completely
omitted in display, advising programs to find substitutions). I don't
see any elegant but powerful interface, which could be used in average
programs.

I guess that many programs would only want to ask for the physical width
of a byte string, whether UTF-8 or not, to display data in columns or
correctly reposition cursor. Editors and readline would want to see how
byte strings are decomposed into characters; maybe more, but sometimes
that may be enough.

The goal is to allow writing *simple* programs which correctly handle
complicated multinational accented or Korean strings, without a need
of supplying each program with huge character descriptions, character
approximation tables, charset tables, procedures which query various
terminal characteristics, font lists etc.

> And does anyone know of an editor, even a very simple one, that can
> run in a Linux console in UTF-8 mode?

Title: sted Small (or Stupid) Text Editor
Version: 0.3.0
Entered-date: 12JUL98
Description: A very small and simple ncurses-based text editor. Comes with a
psychedelic mode option, if you think other text editors are too
boring. Supports multi-byte characters on Linux console.
Keywords: text editor UTF-8 Unicode multibyte i18n
Author: uxm165t@tninet.se (Linus Åkerlund)
Maintained-by: rch@WriteMe.Com (Ricardas Cepas)
Primary-site: sunsite.unc.edu /pub/Linux/apps/editors/terminal
40 kB sted-0.3.0.tar.gz
1 kB sted.lsm
Platforms: Linux, ncurses, POSIX
Copying-policy: GPL

It's indeed simple, in an early stage of development. It hacks ncurses by
setting very large screen width.

-- 
 __("<    Marcin Kowalczyk * qrczak@knm.org.pl http://kki.net.pl/qrczak/
 \__/       GCS/M d- s+:-- a22 C+++>+++$ UL++>++++$ P+++ L++>++++$ E->++
  ^^                W++ N+++ o? K? w(---) O? M- V? PS-- PE++ Y? PGP->+ t
QRCZAK                  5? X- R tv-- b+>++ DI D- G+ e>++++ h! r--%>++ y-

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/