Re: [RFC PATCH v1 2/6] kernel-doc: replace kernel-doc perl parser with a pure python one (WIP)

From: Daniel Vetter
Date: Wed Jan 25 2017 - 05:35:52 EST

On Wed, Jan 25, 2017 at 12:24:31PM +0200, Jani Nikula wrote:
> Finally, while I'd love to see scripts/kernel-doc go, I do have to ask
> if changing roughly 3k lines of Perl to roughly 3k lines of Python (*)
> really makes everything better? They both still parse everything using a
> large pile of regular expressions and a clunky state machine. When I
> look at the code, I'm afraid I do not get that liberating feeling of
> throwing out old junk in favor of something small or elegant or even
> obviously more maintainable than the old one. The new one offers more
> features, but repeatedly we face the problem that it's all lumped in
> together with the parser change. We should be able to look at the parser
> change and the other improvements separately.

I share this concern a lot. The kernel-doc perl is a horror show, but it's
a horror show that 3-4 people now somewhat understand. Simply translating
the entire script into python leaves us with the same horror show, but in
a different language. And personally I'm not versed at all in either of
them (and I think that applies to many kernel hackers), so seems a wash.

If the new script would implement the state machinery in some
parser-combinator library to make it much easier to maintain, while still
being bug-for-bug compatible, then I'd be much, much more in favour of
doing this. And once we go to that amount of effort, then rewriting it in
python for more consistency with sphinx is definitely a good idea.

> That said, perhaps having an elegant parser (perhaps based on a compiler
> plugin) is incompatible with the idea of making it a bug-for-bug drop-in
> replacement of the old one, and it's something we need to think about.

Yeah, I fear we'll always need our own parser to avoid breaking the world.
But there's definitely better ways out there to write parsers than
cobbling together regexes in a state machine that uses globals :-)
Daniel Vetter
Software Engineer, Intel Corporation