Re: [PATCH v2 5/5] docs: automarkup.py: Allow automatic cross-reference inside C namespace
From: Nícolas F. R. A. Prado
Date: Mon Nov 02 2020 - 10:33:19 EST
On Wed Oct 14, 2020 at 4:19 PM -03, Jonathan Corbet wrote:
>
> On Wed, 14 Oct 2020 11:56:44 +0200
> Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:
>
> > > To make the first step possible, disable the parallel_read_safe option
> > > in Sphinx, since the dictionary that maps the files to the C namespaces
> > > can't be concurrently updated. This unfortunately increases the build
> > > time of the documentation.
> >
> > Disabling parallel_read_safe will make performance very poor.
> > Doesn't the C domain store the current namespace somewhere?
> > If so, then, instead of using the source-read phase, something
> > else could be used instead.
The issue is that C domain parsing happens at an earlier phase in the Sphinx
process, and the current stack containing the C namespace is long gone when we
get to do the automatic cross-referencing at the doctree-resolved phase.
Not only that, but the namespace isn't assigned to the file it's in, and
vice-versa, because Sphinx's interest is in assigning a C directive it is
currently reading to the current namespace, so there isn't any point in saving
which namespaces appeared at a given file. That is exactly what we want, but
Sphinx doesn't have that information.
For instance, printing all symbols from app.env.domaindata['c']['root_symbol']
shows every single C namespace, but the docname field in each of them is None.
That's why the way to go is to assign the namespaces to the files at the
source-read phase on our own.
> That seems like the best solution if it exists, yes. Otherwise a simple
> lock could be used around c_namespace to serialize access there, right?
Actually I was wrong when I said that the issue was that "they can't be
concurrently updated". When parallel_read_safe is enabled, Sphinx spawns
multiple processes rather than multiple threads, to get true concurrency by
sidestepping python's GIL. So the same c_namespace variable isn't even
accessible across the multiple processes.
Reading multiprocessing's documentation [1] it seems that memory could be shared
between the processes using Value or Array, but both would need to be passed to
the processes by the one who spawned them, that is, it would need to be done
from Sphinx's side.
So, at the moment I'm not really seeing a way to have this information be shared
concurrently by the python processes but I will keep searching.
Thanks,
Nícolas
[1] https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes