Re: [RFC PATCH 0/3] Documentation: switch to pdflatex and fix pdf build

From: Mauro Carvalho Chehab
Date: Mon Aug 15 2016 - 18:26:46 EST


Hi Jon,

Em Mon, 15 Aug 2016 09:17:52 -0300
Mauro Carvalho Chehab <mchehab@xxxxxxxxxxxxx> escreveu:

> Em Mon, 15 Aug 2016 12:40:21 +0300
> Jani Nikula <jani.nikula@xxxxxxxxx> escreveu:
>
> > On Sat, 13 Aug 2016, Jonathan Corbet <corbet@xxxxxxx> wrote:
> > > On Wed, 10 Aug 2016 18:54:06 +0300
> > > Jani Nikula <jani.nikula@xxxxxxxxx> wrote:
> > >
> > >> With these you should be able to get started with pdf generation. It's a
> > >> quick transition to pdflatex, the patches are not very pretty, but the
> > >> pdf output is. Patch 3/3 works as an example where to add your stuff
> > >> (latex_documents in conf.py) and how.
> > >
> > > OK, now I have a bone to pick with you.
> > >
> > > I applied this, then decided to install the needed toolchain on the
> > > Tumbleweed system I've been playing with; it wanted to install 1,727
> > > packages to get pdflatex. Pandoc just doesn't seem so bad anymore.
> >
> > Jon, I sent these to unblock Luis, and as a starting point for a
> > discussion about rst2pdf vs. pdflatex. I didn't mean I'd want these
> > merged as-is! I'm sorry if I didn't make myself clear.
> >
> > I don't mind at all if you want to drop them.

I played for a while with rst2pdf and LaTeX-based tools, plus
using Sphinx math extension to improve the media documentation.

Here's my findings:

1) I'm pretty sure there's no way for us to get rid of LaTex.

The problem is that the Sphinx math extension depends on LaTeX amsmath
extension. Ok, someone could write some extension in the future to add math
support without requiring LaTeX and fix rst2pdf (or rewrite it), but this
would take time and efforts.

So, for now, I think we should just assume that anyone wanting to generate
pdf docs will need LaTeX (or, more likely, TeTeX).

2) rst2pdf seems to be incompatible with Sphinx math extension,
at least on Sphinx 1.4.5;

3) pdflatex doesn't handle UTF-8 chars. This is a problem for media,
as we use two UTF-8 symbols that are incompatible with the fonts
used by pdflatex:

- â U+2265 (GREATER-THAN OR EQUAL TO) utf-8 character
- â u+2264 (LESS-THAN OR EQUAL TO) utf-8 character

4) pdflatex output is not nice: all black and white; ugly (IMHO) fonts;

5) At media docs, some tables will only print ok in landscape.

After making the media books build, I think that the best way is to
use xelatex instead of pdfdocs. Visually, xelatex output is, IMHO,
nice - and it has colors :-)

It seems that there's yet another option: lualatex. I didn't try to
build with it. So, not sure if its output is better or not, nor if
some extra config for it is needed at conf.py.

I sent a patch series addressing most of the issues above.

If you want to see the output:
https://mchehab.fedorapeople.org/TheLinuxKernel.pdf

This was built on Fedora 24.

Issues:
-------

Even with xelatex, there are still some issues to be addressed:

- There are two hacks to build media: one removes the *.h.rst files
and the other one comments out two C code blocks inside tables;

- There are lots of noise at PDF output. I didn't even try to look
into them;

- Almost all tables at the media books are mangled. Rotating them to
landscape can fix several of them. I added support for it, however,
that requires to manually add LaTeX tags before and after the tables,
like:

.. raw:: latex

\begin{landscape}

<some table(s)>

.. raw:: latex

\end{landscape}

As the above tags are LaTeX specific, they should not interfere on
html (or e-pub) output.

One such example is this table:
"Table 2.18: Packed RGB Image Formats"
(at page 92 of the PDF file)

I'm seeking for a solution to scale it and rotate (as just rotating it
is not enough).

--
Thanks,
Mauro