[RFC PATCH v2 0/7] docs: pdfdocs: Improve font choice in CJK translations

From: Akira Yokosawa
Date: Mon Jul 19 2021 - 07:08:13 EST

Hi all,

I'm calling this patch set "RFC PATCH v2", but the approach has changed
a lot since "RFC PATCH 0/3 docs: pdfdocs: Improve alignment of CJK
ascii-art" [1], hence the different title.

I added Hu Haowen, who is working on zh_TW translations, and Shinwoo Lee,
who has recently shown interest in enhancing ko_KR translations [2] but
has got no public response yet, in the CC list in the faint hope of their
interest in CJK typesetting by Sphinx + XeLaTeX. If either (or both) of
you has no interest, please let me know. I won't bother you in this area.

I thought it was impossible to switch CJK font choices in the middle
of a document, but it turns out it is actually possible.

Patch 1/7 is mostly the same as the original "RFC PATCH 1/3".

Patch 2/7 is the most important change in this patch set.
It introduces a pair of LaTeX macros for each CJK language:
zh_CN: \kerneldocBeginSC, \kerneldocEndSC
ko_KR: \kerneldocBeginKR, \kerneldocEndKR
ja_JP: \kerneldocBeginJP, \kerneldocEndJP

, which perform magical font settings for the languages.

Each pair of macros are added in translations' respective index.rst.

As for Hangul inter-phrase spaces, xeCJK provides a knob to preserve
them. \kerneldocBeginKR has the knob enabled.

Also note that CJKmainfont is changed from "Noto Sans CJK" to "Noto
Serif CJK", as the latter looks more consistent with the roman (serif)
family of Latin text.

The font choice of latin monospace letters is overridden (for
ascii-art alignmet) only when the document is built by
"make SPHINXDIRS=translations pdfdocs".

As for the to-be-merged zh_TW translations, the same approach should
work by choosing "Noto xxxx CJK TC" fonts.

There remain a couple of glitches remaining as of Patch 2/7.
Following patches address them one by one.

Patch 3/7 increases line spacing of CJK contents.
In general, CJK characters in single spacing looks too busy.
One-half spacing generates a reasonable result (to my eyes).

Patch 4/7 is a workaround of "Noto CJK" fonts' lack of italic shapes.

Patch 5/7 fixes excessive kerning by xeCJK around quotation marks
in Korean and Japanese translations. Quotation marks in "Noto Serif
CJK KR" and "Not Serif CJK JP" fonts are half-width rather than
full-width in "Noto Serif CJK SC".

Patches 6/7 and 7/7 corresponds to 2/3 and 3/3 in the original RFC.
They attempt to align ascii-art figures found in the Korean translation
of memory-barriers.txt.

Now, candidates of *true* monospace font for Hangul are:

D2Coding, Sarasa Mono K, and (ugly looking) Unifont.

I said earlier in response to Mauro's concerns with regard to
"Sarasa Mono" font and sphinx-pre-install [3]:

Akira>>> Existence of "Sarasa Mono SC" can be checked by the command:
>>> fc-list | grep "Sarasa Mono SC," | grep "style=Regular" | wc -l
>>> If the result is *not* "0", you have the font somewhere in your
>>> fontconfig path.
>>> I think this is portable across distros.
>>> Wouldn't this suffice for sphinx-pre-install?
>> No. The sphinx-pre-install tool generate a list of commands
>> needed to install the pre-reqs on a given distro.
>> ...
>> The same command, when executed on a different distro will
>> print a different set of packages and commands.
> I see...
> So let's forget Unifont and "Sarasa Mono" for the time being.
> By adding some custom configuration of fontconfig, "Noto Sans Mono
> CJK SC" can be made an alias of "Sarasa Mono", "Unifont", or whatever
> alternative font one wants to try.

This was my misunderstanding. Yes, alias is possible by fontconfig,
but fontconfig's alias names are not recognized by fontspec/xeCJK + XeLaTeX.
So we need to embed the actual names of candidate fonts in the preamble.

Mauro, isn't the header comment in conf.py added in Patch 6/7 good enough?
I don't think those nice-to-have fonts are pre-reqs that should be
taken care of by the sphinx-pre-install script.

On the other hand, as having heard of nothing from SeongJae, who is
the maintainer of Korean memory-barrier.txt, there might be nobody
who cares the Korean chapter in translations.pdf.
Patches 6/7 and 7/7 need explicit Acks of someone who reads it, I guess.

This series is tested against Sphinx 2.4.4 and brand-new 4.1.1.

Again, any feedback is appreciated!

Thanks, Akira

[1]: https://lore.kernel.org/lkml/386938dc-6290-239c-4b4f-c6153f3d98c5@xxxxxxxxx/
[2]: https://lore.kernel.org/linux-doc/CAJMZz3_M34cy4ZbKGLZniGeUPOoJ7DMXdDOQxy-T44_cQ1+Udw@xxxxxxxxxxxxxx/
[3]: https://lore.kernel.org/lkml/0cfd8dfb-b304-4073-973c-930a93d19a17@xxxxxxxxx/

Akira Yokosawa (7):
docs: pdfdocs: Refactor config for CJK document
docs: pdfdocs: Add CJK-language-specific font settings
docs: pdfdocs: Use one-half spacing in CJK translations
docs: pdfdocs: Permit AutoFakeSlant for CJK fonts
docs: pdfdocs: Teach xeCJK the width of quotation marks
docs: pdfdocs: Add optional choices for Korean monospace font
docs/ko_KR: Use white spaces behind CJK characters in ascii-art

Documentation/conf.py | 77 +++++++++++++++----
Documentation/translations/conf.py | 44 +++++++++++
Documentation/translations/ja_JP/howto.rst | 8 ++
Documentation/translations/ja_JP/index.rst | 5 ++
Documentation/translations/ko_KR/howto.rst | 8 ++
Documentation/translations/ko_KR/index.rst | 2 +
.../translations/ko_KR/memory-barriers.txt | 14 ++--
Documentation/translations/zh_CN/index.rst | 5 ++
8 files changed, 140 insertions(+), 23 deletions(-)
create mode 100644 Documentation/translations/conf.py