Re: [PATCH 08/18] docs: kdoc_parser: fix parser to support multi-word types

From: Mauro Carvalho Chehab

Date: Tue Mar 03 2026 - 15:24:52 EST

On Tue, 03 Mar 2026 10:34:48 -0700
Jonathan Corbet <corbet@xxxxxxx> wrote:

> Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> writes:
>
> > The regular expression currently expects a single word for the
> > type, but it may be something like "struct foo".
> >
> > Add support for it.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx>
> > Acked-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
> > Tested-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
> > Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@xxxxxxxxx>
> > ---
> > tools/lib/python/kdoc/kdoc_parser.py | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
> > index 39ff27d421eb..22a820d33dc8 100644
> > --- a/tools/lib/python/kdoc/kdoc_parser.py
> > +++ b/tools/lib/python/kdoc/kdoc_parser.py
> > @@ -1018,14 +1018,14 @@ class KernelDoc:
> >
> > default_val = None
> >
> > - r= KernRe(OPTIONAL_VAR_ATTR + r"[\w_]*\s+(?:\*+)?([\w_]+)\s*[\d\]\[]*\s*(=.*)?")
> > + r= KernRe(OPTIONAL_VAR_ATTR + r"\s*[\w_\s]*\s+(?:\*+)?([\w_]+)\s*[\d\]\[]*\s*(=.*)?")
>
> Just for future reference...I *really* think that the code is improved
> by breaking up and commenting gnarly regexes like this. They are really
> unreadable in this form. (And yes, I know the code has been full of
> these forever, but we can always try to make it better :)

Heh, you're right: this could be better.

> Anyway, just grumbling.

Heh, if we start using a code like the tokenizer I'm experimenting
here:

https://lore.kernel.org/linux-doc/20260303155310.5235b367@localhost/

we could probably get rid of regexes in the future, using instead
a loop that would be picking "ID" tokens, e.g. basically we would
have something similar to this completely untested code snippet:

self.tokenizer = CTokenizer()

...

ids = []
get_default = False

while kind, value in self.tokenizer(proto):
if kind == "ID":
ids.append(value)

if kind == "OP" and value == "=":
get_default = True
break

if get_default:
while kind, value in self.tokenizer(proto):
if kind in ["CHAR", "STRING", "NUMBER"]:
default_val = value
break

declaration_name = ids[-1]

Thanks,
Mauro