Re: [PATCH v2 0/4] have the vt console preserve unicode characters
From: Adam Borowski
Date: Tue Jun 19 2018 - 09:10:06 EST
On Sun, Jun 17, 2018 at 03:07:02PM -0400, Nicolas Pitre wrote:
> The vt code translates UTF-8 strings into glyph index values and stores
> those glyph values directly in the screen buffer. Because there can only
> be at most 512 glyphs, it is impossible to represent most unicode
> characters, in which case a default glyph (often '?') is displayed
> instead. The original unicode value is then lost.
>
> The 512-glyph limitation is inherent to VGA displays, but users of
> /dev/vcs* shouldn't have to be restricted to a narrow unicode space from
> lossy screen content because of that. This is especially true for
> accessibility applications such as BRLTTY that rely on /dev/vcs to rander
> screen content onto braille terminals.
You're thinking small. That 256 possible values for Braille are easily
encodable within the 512-glyph space (256 char + stolen fg brightness bit,
another CGA peculiarity). Your patchset, though, can be used for proper
Unicode support for the rest of us.
The 256/512 value limitation applies only to CGA-compatible hardware; these
days this means vgacon. But most people use other drivers. Nouveau forces
graphical console, on arm* there's no such thing as VGA[1], etc.
Thus, it'd be nice to use the structure you add to implement full Unicode
range for the vast majority of people. This includes even U+2800..FF. :)
> This patch series introduces unicode support to /dev/vcs* devices,
> allowing full unicode access from userspace to the vt console which
> can, amongst other purposes, appropriately translate actual unicode
> screen content into braille. Memory is allocated, and possible CPU
> overhead introduced, only if /dev/vcsu is read at least once.
What about doing so if any updated console driver is loaded? Possibly, once
the vt in question has been switched to (>99% people never see anything but
tty1 during boot-up, all others showing nothing but getty). Or perhaps the
moment any non-ASCII character is output to the given vt.
If memory usage is a concern, it's possible to drop the old structure and
convert back only in the rare case the driver is unloaded; reads of old-
style /dev/vc{s,sa}\d* are not speed-critical thus can use conversion on the
fuly. Unicode takes only 21 bits out of 32 you allocate, that's plenty of
space for attributes: they currently take 8 bits; naive way gives us free 3
bits that could be used for additional attributes.
Especially underline is in common use these days; efficient support for CJK
would also use one bit to mark left/right half. And it's decades overdue to
drop blink, which is not even supported by anything but vgacon anyway!
(Graphical drivers tend to show this bit as bright background, but don't
accept SGR codes other thank blink[2].)
> I'm a prime user of this feature, as well as the BRLTTY maintainer Dave Mielke
> who implemented support for this in BRLTTY. There is therefore a vested
> interest in maintaining this feature as necessary. And this received
> extensive testing as well at this point.
So, you care only about people with faulty wetware. Thus, it sounds like
work that benefits sighted people would need to be done by people other than
you. So I'm only mentioning possible changes; they could possibly go after
your patchset goes in:
A) if memory is considered to be at premium, what about storing only one
32-bit value, masked 21 bits char 11 bits attr? On non-vgacon, there's
no reason to keep the old structures.
B) if being this frugal wrt memory is ridiculous today, what about instead
going for 32 bits char (wasteful) 32 bits attr? This would be much nicer
15 bit fg color + 15 bit bg color + underline + CJK or something.
You already triple memory use; variant A) above would reduce that to 2x,
variant B) to 4x.
Considering that modern machines can draw complex scenes of several
megapixels 60 times a second, it could be reasonable to drop the complexity
of two structures even on vgacon: converting characters on the fly during vt
switch is beyond notice on any hardware Linux can run.
> This is also available on top of v4.18-rc1 here:
>
> git://git.linaro.org/people/nicolas.pitre/linux vt-unicode
Meow!
[1]. config VGA_CONSOLE
depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC && !SUPERH && \
(!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) && \
!ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !NDS32 && !S390
[2]. Sounds like an easy improvement; not so long ago I added "\e[48;5;m",
"\e[48;2;m" and "\e[100m" which could be improved when on unblinking drivers.
Heck, even VGA can be switched to unblinking by flipping bit 3 of the
Attribute Mode Control Register -- like we already flip foreground
brightness when 512 glyphs are needed.
--
âââââââ There's an easy way to tell toy operating systems from real ones.
âââââââ Just look at how their shipped fonts display U+1F52B, this makes
âââââââ the intended audience obvious. It's also interesting to see OSes
âââââââ go back and forth wrt their intended target.