Re: [RFC PATCH v1 2/6] kernel-doc: replace kernel-doc perl parser with a pure python one (WIP)

From: Jani Nikula
Date: Wed Jan 25 2017 - 15:59:36 EST

On Wed, 25 Jan 2017, Markus Heiser <markus.heiser@xxxxxxxxxxx> wrote:
> Am 25.01.2017 um 11:24 schrieb Jani Nikula <jani.nikula@xxxxxxxxx>:
>> Markus, thanks for your work on this.
> Thanks for your comments!
>> Excuse me for my bluntness, but I think changing everything in a single
>> commit, or even a few commits, is strictly not acceptable.
> OK, I understand.
>> When I changed *small* things in scripts/kernel-doc, I would make
>> htmldocs before and after the change, and recursively diff the produced
>> output to ensure there were no surprises. We already have enough
>> documentation that a manual eyeballing of the output is simply not
>> sufficient to ensure things don't break.
>> The diff in output between before and after this series? 160k lines of
>> unified diff without context ('diff -u0 -r old new | wc -l').
>> Many of the changes are improvements on the result, such as using proper
>> <div> tags for function parameter lists etc., but clearly changing the
>> output should be independent of changing the parser, so we have some
>> chance of validating the parser.
> Hmm ... I try to sort my thoughts on this:
> The both parser are generating reST output. We have tested reST output
> so it should be enough to compare the reST from the old one with the
> new one ... at least theoretical.
> But the problem I see here is, that the perl script generates a
> reST output which I can't use. As an example we can take a look at
> the man-page builder I shipped in the series.

Sorry, I still don't understand *why* you can't use the same rst. Your
explanation seems to relate to man pages, but man pages come
*afterwards*, and are a separate improvement. I know you talk about lack
of proper structure and all that, but *why* can it strictly not be used,
if the *current* rst clearly can be used?


> In the commit message there is a small example:
> <section docname="basics ...>
> <title>get_sd_load_idx</title>
> <kernel_doc_man manpage="get_sd_load_idx.9"/>
> ...
> <desc desctype="function" domain="c"....>
> <desc_signature ....>
> <desc_type>int</desc_type>
> ...
> You see that it has <section> tag with childs <title/>, <kernel_doc_man/>
> and so on. This structured markup is used by builders, they navigate through
> the structured tree picking up nodes and spit out some man-page html, or
> whatever builder it is.
> ATM the perl parser generates a reST output which does not have such
> a structured tree, so the builder can't navigate in.
> So, what I mean is, the new parser has to generate a complete different reST
> output and thats why we can't compare the perl parser with python one on a reST
> basis ... and if reST is different, HTML is different :(
> So we do not have any chance to track regression when switching from
> the old to the new parser.
> Thats are my thoughts on this topic, may be you have a solution for this?
>>> Ideally at the time of merging, we would be able to build the docs with
>>> *either* kerneldoc.
>> I'd be fine with switching over in a single commit that doesn't
>> drastically change the output.
> One solution might be to improve the reST output of the perl script
> first, so that it produce something which has a structure and we
> all can agree on (short: reST output is the reference, ATM the reference
> need some improvements)
> If this is a way we like to go, I can send a patch for the perl script,
> so that we can commit one a reST reference.
>> A drop-in replacement. But that's not the
>> case here.
>>> - I'll have to try it out to see how noisy it is. I'm not opposed to
>>> stricter checks; indeed, they could be a good thing. But we might want
>>> to have an option so we can cut back on the noise by default.
>> The increase in 'make htmldocs' build log was from 1521 to 2791 lines in
>> my tree. Arguably there was useful extra diagnosis, but some of it was
>> the printouts of long lists of definitions that were not found, one per
>> line. So it could be condensed without losing info too.
> Yes, this was just a 1:1 merge from my POC, there are a lot of things
> which could be meld down. ATM, for me it is important to get a feedback
> on the functionalities and concepts of kernel-doc apps (RFC).
>> On to performance. With the default build options the new system was
>> noticeably slower than the current one, with a 50% increase on my
>> machine. But what really caught me by surprise was that passing
>> SPHINXOPTS=-j5 to parallelize worked better on the current system,
>> making the new one a whopping 70% slower. Of course, the argument is
>> that the proposed parser does more and is better, but due to the
>> monolithic change it's impossible to pinpoint the culprit or do a proper
>> cost/benefit analysis on this. Again, this calls for a more broken down
>> series of patches to make the changes.
> Ups, I have to look closer ... I thought the py-solution is faster
> since it does not for processes and does some caching.
>> Finally, while I'd love to see scripts/kernel-doc go, I do have to ask
>> if changing roughly 3k lines of Perl to roughly 3k lines of Python (*)
>> really makes everything better? They both still parse everything using a
>> large pile of regular expressions and a clunky state machine. When I
>> look at the code, I'm afraid I do not get that liberating feeling of
>> throwing out old junk in favor of something small or elegant or even
>> obviously more maintainable than the old one. The new one offers more
>> features, but repeatedly we face the problem that it's all lumped in
>> together with the parser change. We should be able to look at the parser
>> change and the other improvements separately.
>> That said, perhaps having an elegant parser (perhaps based on a compiler
>> plugin) is incompatible with the idea of making it a bug-for-bug drop-in
>> replacement of the old one, and it's something we need to think about.
> Before I started implementing the parser I thought about separating
> parsing from generating reST. I played a bit with pycparser
> but I realized that the coverage of those parser might be not
> enough for the kernel sources. At this time you mentioned sparse.
> I haven't had time to at sparse but I guess that this is the
> tool.
> -- Markus --
>> All in all I think the message should be clear: this needs to be split
>> into small, incremental changes. Just like we do everything in the
>> kernel.
>> BR,
>> Jani.
>> (*) Please do not get hung up on these numbers. The Python version does
>> more in some ways, but adds more deps such as fspath that's not
>> included in the figures, and the Perl version outputs more
>> formats. It's not an apples to apples comparison. Let's just say
>> they are somewhere in the same ballpark.
>> --
>> Jani Nikula, Intel Open Source Technology Center

Jani Nikula, Intel Open Source Technology Center