Re: [PATCH 39/53] docs: dev-tools: testing-overview.rst: avoid using UTF-8 chars

From: Mauro Carvalho Chehab
Date: Wed May 12 2021 - 04:29:17 EST


Em Tue, 11 May 2021 07:35:29 +0800
David Gow <davidgow@xxxxxxxxxx> escreveu:

> On Mon, May 10, 2021 at 6:27 PM Mauro Carvalho Chehab
> <mchehab+huawei@xxxxxxxxxx> wrote:
> >
> > While UTF-8 characters can be used at the Linux documentation,
> > the best is to use them only when ASCII doesn't offer a good replacement.
> > So, replace the occurences of the following UTF-8 characters:
> >
> > - U+2014 ('—'): EM DASH
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx>
> > ---
>
> Oh dear, I do have a habit of overusing em-dashes. I've no problem in
> theory with exchanging them for an ASCII approximation.
> I suppose there's a reason it's the one dash to rule them all: :-)
> https://twitter.com/FakeUnicode/status/727888721312260096/photo/1
>
> > Documentation/dev-tools/testing-overview.rst | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
> > index b5b46709969c..8adffc26a2ec 100644
> > --- a/Documentation/dev-tools/testing-overview.rst
> > +++ b/Documentation/dev-tools/testing-overview.rst
> > @@ -18,8 +18,8 @@ frameworks. These both provide infrastructure to help make running tests and
> > groups of tests easier, as well as providing helpers to aid in writing new
> > tests.
> >
> > -If you're looking to verify the behaviour of the Kernel — particularly specific
> > -parts of the kernel — then you'll want to use KUnit or kselftest.
> > +If you're looking to verify the behaviour of the Kernel - particularly specific
> > +parts of the kernel - then you'll want to use KUnit or kselftest.
>
> As Marco pointed out, having multiple HYPHEN-MINUS symbols in a row is
> probably a better replacement, as it does distinguish the em-dash from
> smaller dashes better.
> However, I need three for sphinx to output an em-dash here (2 hyphens
> only gives me an en-dash).
>
> So, if we want to get rid of the UTF-8 em-dash, my preferences would
> be (in descending order):
> 1. Three hyphens: '---' (sphinx generates an em-dash)
> 2. Two hyphens: '--' (worst case, an en-dash surrounded by spaces --
> as sphinx generates for me -- is still readable, and it's still
> readable as an em-dash in plain text)
> 3. One hyphen as in this patch (which I don't like as much, but will
> no doubt learn to live with)
>
> But it looks like you've got several similar comments on other patches
> in this series, so I'm happy for you to use whatever ends up being
> agreed upon generally.

Yeah, from the comments I received so far, it seems that most developers
want to use '---' for EM DASH and '--' for EN DASH, typing it as ASCII
instead of using U+<number> as this is easier on most editors.

Yet, my understanding is that we don't have a consensus with that
regards, as some patches I sent using a single hyphen were
accepted/reviewed/acked.

So, I sent (and it was already applied) a small patch series (/5)
fixing the cases where UTF-8 chars (including DASH) were added
by mistake (probably due to some conversion tool).

For the remaining issues, my plan is to split this series in two
parts:

The first one with non-polemic UTF-8 changes, and a second one with
just EM/EN DASH, using '---' to replace EM DASH and '--' to replace
EN DASH, as this way, the produced HTML/LaTeX/PDF docs won't change.

This should make easier to discuss the EM/EN DASH changes on
each patch, and see if the above default is the better fit for a
particular usecase.

Thanks,
Mauro