Re: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

From: Nícolas F. R. A. Prado
Date: Thu Oct 08 2020 - 09:55:17 EST


On Thu Oct 8, 2020 at 2:27 AM -03, Mauro Carvalho Chehab wrote:
>
> Hi Nícolas,
>
> Em Wed, 07 Oct 2020 23:12:25 +0000
> Nícolas F. R. A. Prado <nfraprado@xxxxxxxxxxxxxx> escreveu:
>
> > While Sphinx 2 used a single c:type role for struct, union, enum and
> > typedef, Sphinx 3 uses a specific role for each one.
> > To keep backward compatibility, detect the Sphinx version and use the
> > correct roles for that version.
> >
> > Also, Sphinx 3 is more strict with its C domain and generated warnings,
> > exposing issues in the parsing.
> > To fix the warnings, make the C regexes use ASCII, ensure the
> > expressions only match the beginning of words and skip trying to
> > cross-reference C reserved words.
> >
> > Signed-off-by: Nícolas F. R. A. Prado <nfraprado@xxxxxxxxxxxxxx>
> > ---
> >
> > Hi,
> >
> > after Mauro's series making everything ready for Sphinx 3.1, only the automarkup
> > was left to be ported.
> > This patch makes the needed changes to automarkup so that we can soon flip the
> > switch to Sphinx 3.1.
> >
> > This change was tested both with Sphinx 2.4.4 and Sphinx 3.1.
> >
> > This change doesn't add any warnings to the Documentation build.
> > I tested it with Mauro's series but later rebased it to docs-next, and it can be
> > accepted independently of that series.
> >
> > I ended up doing more than one thing in this single patch, but since it was all
> > changing the same lines and for the same purpose, I felt it would be better to
> > keep it as a single commit.
> >
>
> Thanks for doing this! That was the last missing part on fully
> supporting
> Sphinx 3.1+.
>
> > Mauro,
> > if this patch is ok, the 3rd patch in your series, which disables automarkup for
> > sphinx 3, should be dropped.
>
> Yeah, sure.
>
> > Although I'm not sure what the implications of your patches adding namespaces
> > and using the c:macro for functions are.
>
> With regards to namespaces:
>
> Currently, only the media docs use namespaces, and it declares it at the
> beginning of each file that needs it, without overriding it later[1].
>
> [1] btw, the cdomain.py backward compat code doesn't support namespace
> changes - as it parses namespaces before handling the C domain tags.
> I doubt that we'll need to have a single .rst file using more than
> one namespace anyway.
>
> The main usage is to avoid conflicts for uAPI documentation for
> syscalls - actually for libc userspace wrappers to syscalls. It
> documents
> things like: open, close, read, write, ioctl, poll, select.

If it's mainly for that, I think automarkup could skip handling namespaces.
>From automarkup.py:

#
# Many places in the docs refer to common system calls. It is
# pointless to try to cross-reference them and, as has been known
# to happen, somebody defining a function by these names can lead
# to the creation of incorrect and confusing cross references. So
# just don't even try with these names.
#
Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
'socket' ]

So unless I'm confusing things and the namespaces actually sidestep that issue,
the namespace handling could be left out of automarkup.

>
> I'm not sure if the automarkup should be aware of it, or if the c.py
> code
> at Sphinx 3.x will add the namespace automatically, but I suspect that
> automarkup will need to handle it as well.
>
> One file you could use for checking it is this one:
>
> Documentation/userspace-api/media/v4l/hist-v4l2.rst
>
> It contains a namespace directive and documents what changed without
> using any explicit reference (after my patch series + linux-next).
>
> With regards to c:macro vs c:function:
>
> I suspect that automarkup should test both when trying to do
> cross-references for function-like calls. E. g. test first if
> there is a :c:function, falling back to check for :c:macro.
>
> I would add a "sphinx3_c_func_ref" function that would handle
> such special case, e. g. something like:
>
> markup_func_sphinx3 = {RE_doc: markup_doc_ref,
> RE_function: sphinx3_c_func_ref,
> RE_struct: markup_c_ref,
> RE_union: markup_c_ref,
> RE_enum: markup_c_ref,
> RE_typedef: markup_c_ref}

Sounds good.

I'll make this patch into a series and add that function/macro handling as a new
patch, and the namespace handling depending on your answer on the above comment,
for v2.

>
> > All I did here was use the specific roles for sphinx 3 and fix the warnings, but
> > that was enough to get correct cross-references even after your series.
> >
> > Thanks,
> > Nícolas
>
> >
> >
> > Documentation/sphinx/automarkup.py | 69 ++++++++++++++++++++++++++----
> > 1 file changed, 60 insertions(+), 9 deletions(-)
> >
> > diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> > index a1b0f554cd82..fd1e927408ad 100644
> > --- a/Documentation/sphinx/automarkup.py
> > +++ b/Documentation/sphinx/automarkup.py
> > @@ -22,13 +22,34 @@ from itertools import chain
> > # :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> > # bit tries to restrict matches to things that won't create trouble.
> > #
> > -RE_function = re.compile(r'(([\w_][\w\d_]+)\(\))')
> > -RE_type = re.compile(r'(struct|union|enum|typedef)\s+([\w_][\w\d_]+)')
> > +RE_function = re.compile(r'\b(([a-zA-Z_]\w+)\(\))', flags=re.ASCII)
> > +
> > +#
> > +# Sphinx 2 uses the same :c:type role for struct, union, enum and typedef
> > +#
> > +RE_generic_type = re.compile(r'\b(struct|union|enum|typedef)\s+([a-zA-Z_]\w+)',
> > + flags=re.ASCII)
> > +
> > +#
> > +# Sphinx 3 uses a different C role for each one of struct, union, enum and
> > +# typedef
> > +#
> > +RE_struct = re.compile(r'\b(struct)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> > +RE_union = re.compile(r'\b(union)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> > +RE_enum = re.compile(r'\b(enum)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> > +RE_typedef = re.compile(r'\b(typedef)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> > +
> > #
> > # Detects a reference to a documentation page of the form Documentation/... with
> > # an optional extension
> > #
> > -RE_doc = re.compile(r'Documentation(/[\w\-_/]+)(\.\w+)*')
> > +RE_doc = re.compile(r'\bDocumentation(/[\w\-_/]+)(\.\w+)*')
> > +
> > +#
> > +# Reserved C words that we should skip when cross-referencing
> > +#
> > +Skipnames = [ 'for', 'if', 'register', 'sizeof', 'struct', 'unsigned' ]
> > +
> >
> > #
> > # Many places in the docs refer to common system calls. It is
> > @@ -48,9 +69,22 @@ def markup_refs(docname, app, node):
> > #
> > # Associate each regex with the function that will markup its matches
> > #
> > - markup_func = {RE_type: markup_c_ref,
> > - RE_function: markup_c_ref,
> > - RE_doc: markup_doc_ref}
> > + markup_func_sphinx2 = {RE_doc: markup_doc_ref,
> > + RE_function: markup_c_ref,
> > + RE_generic_type: markup_c_ref}
> > +
> > + markup_func_sphinx3 = {RE_doc: markup_doc_ref,
> > + RE_function: markup_c_ref,
> > + RE_struct: markup_c_ref,
> > + RE_union: markup_c_ref,
> > + RE_enum: markup_c_ref,
> > + RE_typedef: markup_c_ref}
> > +
> > + if sphinx.__version__[0] == '3':
> > + markup_func = markup_func_sphinx3
> > + else:
> > + markup_func = markup_func_sphinx2
> > +
> > match_iterators = [regex.finditer(t) for regex in markup_func]
> > #
> > # Sort all references by the starting position in text
> > @@ -79,8 +113,24 @@ def markup_refs(docname, app, node):
> > # type_name) with an appropriate cross reference.
> > #
> > def markup_c_ref(docname, app, match):
> > - class_str = {RE_function: 'c-func', RE_type: 'c-type'}
> > - reftype_str = {RE_function: 'function', RE_type: 'type'}
> > + class_str = {RE_function: 'c-func',
> > + # Sphinx 2 only
> > + RE_generic_type: 'c-type',
> > + # Sphinx 3+ only
> > + RE_struct: 'c-struct',
> > + RE_union: 'c-union',
> > + RE_enum: 'c-enum',
> > + RE_typedef: 'c-type',
> > + }
> > + reftype_str = {RE_function: 'function',
> > + # Sphinx 2 only
> > + RE_generic_type: 'type',
> > + # Sphinx 3+ only
> > + RE_struct: 'struct',
> > + RE_union: 'union',
> > + RE_enum: 'enum',
> > + RE_typedef: 'type',
> > + }
> >
> > cdom = app.env.domains['c']
> > #
> > @@ -89,7 +139,8 @@ def markup_c_ref(docname, app, match):
> > target = match.group(2)
> > target_text = nodes.Text(match.group(0))
> > xref = None
> > - if not (match.re == RE_function and target in Skipfuncs):
> > + if not ((match.re == RE_function and target in Skipfuncs)
> > + or (target in Skipnames)):
> > lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
> > lit_text += target_text
> > pxref = addnodes.pending_xref('', refdomain = 'c',
>
>
>
> Thanks,
> Mauro

Thanks,
Nícolas